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DESCRIPTION 

GRAPHICAL USER INTERFACE FOR DISPLAY 
5 AND ANALYSIS OF BIOLOGICAL SEQUENCE DATA 

COPYRIGHT NOTICE 
A portion of the disclosure of this patent document, including Appendices, contains 
material which is subject to copyright protection. The copyright owner has no objection to the 
1 0 facsimile reproduction by anyone of the patent document or the patent disciosure as it appears 
in the Patent and Trademark Office patent file or records, but otherwise reserves all copyright 
rights whatsoever. 

15 

Cross-Reference to Related Applications 
This application claims priority from United States provisional application, serial 
number 60/154,149, filed September 14, 1999, and United States patent application serial 
number 09/397,335, filed September 14. 1999, the disclosures of which are incorporated herein 
20 by reference in their entirety. 

Technical Fie|d 

The present invention relates to a computer research tool for searching and displaying 
biological data. More specifically, the invention relates to a computer research tool utilizing a 
25 novel graphical user interface (GUI) for performing computerized research of biological data 
from various databases and for providing enhanced graphical representation of biological data, 
progressive querying, and cross-navigation of relational data. 

Background Art 

30 Trillions of pieces of information generated by emerging technologies in molecular 

biology and genetics are stored digitally in computer databases worldwide. In fact, every day 
approximately 2000 nucleotide sequences are deposited in publically accessible databases. With 
such a large amount of multidimensional data, researchers rely on complex information systems 
to find, summarize and interpret this biological information. This has resulted in the creation 

35 of a new field of science known as bioinformatics which combines the power of biochemistry, 
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mathematics, and computers. Bioinformatics has allowed the development new databases and 
computational technologies to help in the understanding of the biological meaning encoded in 
vast collections of sequence data. 

A, Biochemistry Overview 

An understanding of the biological meaning of sequence data begins with a study of the 
20 amino acids that make up proteins. Deoxyribonucleic acid (DNA) contains the blueprints for 
these structures. DNA is composed of very long polymers of chemical sub-units known as 
nucleotides. Each nucleotide includes one of four nitrogenous bases: adenine (A), thymine (T), 
cytosine (C) and guanine (G). DNA serves as a template for ribonucleic acid (RNA), which 
serves as a template for proteins. Like DNA, RNA is also composed of nucleotides. Each RNA 
nucleotide includes one of four nitrogenous bases. These bases of RNA differ from that of DNA 
only in the substitution of thymine (T) with uracil (U). Three nucleotides of DNA encode three 
nucleotides of RNA, which in turn encode one amino acid of a protein. 

Proteins are macromolecules of amino acids which show great diversity in physical 
15 properties thereby fulfilling a broad range of biological functions (i.e., polymers of covalently 
bonded amino acids). A protein's structure and function depends upon its amino acid sequence, 
which is determined by the nucleotide sequence of the RNA which produced it, which is 
determined by the nucleotide sequence of the DNA that produced the RNA. Hence, the great 
diversity observed in the sequence of amino acids is the direct result of the many possible 
20 permutations of DNA and RNA. The primary structure is the sequence of amino acids covalently 
bonded together. The secondary structure is the result of amino acid sequence of the polypeptide. 
The bonding causes the chain to develop specific shapes (alpha helix, beta sheet). The tertiary 
structure is the 3-dimensional folding of the alpha helix or the pleated sheet. The quaternary 
structure is the spatial relationship between the different polypeptides in the protein. 
25 B. Sequence Comparison 

Sequence comparison is a very powerful tool in molecular biology, genetics and protein 
chemistry. Frequently, it is unknown for which proteins a new DNA sequence codes or if it 
codes for any protein at all. If you compare a new coding sequence with all known sequences 
there is a high probability to find a similar sequence. Usually one tries to determine what level 
30 of similarity is shared between the proteins in terms of structural and functional characteristics. 
This determination is made by comparing the amino acid sequences of the proteins. It has been 
observed that the primary structures of a given protein from related species closely resembleone 
another. Comparisons of the primary structures of homologous proteins (evolutionary related 
proteins) indicate which of the proteins' amino acid residues or domains (i.e., stretches of amino 
35 acids) are essential to its function, which are of lesser significance, and which have little specific 



WO 01/20535 



PCT/US00/25247 



function. Sequences which are found in similar positions of functionally similar proteins are 
said to be homologous, conservatively substituted or highly conserved. A popular computational 
tool for rapid comparison of a search sequence to a database of known sequences is the BLAST 
search. The advantage of a BLAST search is the ability to find matches to distantly related 
5 sequences. The disadvantage is that the searches become computationally intensive and may 
take an inordinate length of time. 
C. Biological Databases 

In order to perform these sequence comparisons, databases of known biological data 
need to be accessed. There are a lot of different data banks (databases) where biological 

10 information such as DNA and protein sequence data are stored, including, general biological 
databanks such as EMBL/GENBANK (nucleotide sequences), SWISSPROT (protein 
sequences), and PDB/Protein Data Bank (protein structures). [See, e.g., "Comprehensive, 
Comprehensible, Distributed and Intelligent Databases: Current Status" by Frishman, et al. 
Bioinformatics Review, Vol. 14, No. 7, 1998, pgs. 551-561, incorporated herein by reference]. 

15 Specifically, GenBank is an annotated collection of all publically available DNA sequences. 
As of August 1999 there were approximately 3,400,000,000 bases in 4,610,000 sequence 
records. The Genbank database comprises the DNA DataBank of Japan (DDBJ), the European 
Molecular Biology Laboratory (EMBL), and GenBank at NCBI. SWISS-PROTis an annotated 
protein sequence database maintained by the Department of Medical Biochemistry of the 

20 University of Geneva. The PDB/Protein Data Bank, maintained by Brookhaven- National 
Laboratory, contains all publically available solved protein structures. These databases contain 
large amounts of raw sequence data which can be cumbersome to use. 

In an effort to provide a more useful form of biological data, there are a number of 
derived or structured databases which integrate information from multiple primary sources, and 

25 may include relational/cross-referenced data with respect to sequence, structure, function, and 
evolution. A derived database generally contains added descriptive materials on top of the 
primary data or provides novel structuring of the data based on certain defined relationships. 
Derived/structured databases typically structure the protein sequence data into usable sets of 
data (tables), grouping the protein sequences by family or by homology domains. A protein 

30 family is a group of sequences that can be aligned from end to end and are <55% different 
globally. A homologous domain is a subsequence of a protein that is distinguished by a well- 
defined set of properties or characteristics and may also occur in at least two different 
subfamilies. 

An example of structured database is ProDom, a protein domain database, consisting of 
35 an automatic compilation of homologous domains. The database was designed as a tool to help 
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analyze domain arrangements of proteins and protein families. Current versions of the ProDom 
database are built using a procedure based on recursive PSI-BLAST searches. ProDom contains 
57,976 domain families, sorted by decreasing number of protein sequences in the families. 
ProDom is generated from the SWISS-PROT database by automated sequence comparison. 

Similarly, DOMO is a database of homologous protein domain families. DOMO was 
obtained from successive sequence analysis steps including similarity search, domain 
delineation, multiple sequence alignment, and motif construction. DOMO has analyzed 83,054 
non redundantproteinsequencesfrom SWISS-PROT and PIR-lntemationalSequence DataBase 
yielding a database of 99,058 domain clusters into 8,877 multiple sequence alignments. 

Another derived protein sequence database is the Block Database. Blocks are multiply 
aligned ungapped segments correspondingto the most highly conserved regions of proteins. The 
blocks for the Block Database are made automatically by looking for the most highly conserved 
regions in groups of proteins documented in the Prosite Database. The blocks are then calibrated 
against the SWISS-PROT database to obtain a measure of the chance distribution matches. 
15 D. Researching Biological Databases 

Typically biological databases may be searched by either an unstructured (keyword) or 
structured (field based) search. An unstructured search of the database is preformed by 
searching for a keyword or the ID of records. For example, a keyword search of "ecoir 
retrieves a list of protein sequences that are identified by the keyword "ecoli". A structured 
20 search is a more deliberate search, allowing, for example, the searching of the database for 
protein sequences which contain a particular sequence of interest. 

An example of a well known search engine to conduct research on the GenBank 
database is the ENTREZ search engine which utilizes keyword searching. If a search results in 
too many hits, ENTREZ allows the addition of new search terms to progressively narrow the 
25 number of hits. A researcher may then select all or a subset of the entries that match the search 
for display to generate a summary page that reports on each of the selected entries. The search 
results may be displayed in a variety of formats or standardized reports. The form that most 
biologists are familiar with, the GenBank report, shows the raw GenBank entry. Other familiar 
formats include F ASTA and ASN. 1 . The genomes division of ENTREZ has a graphic interface 
30 based on alignments among multiple maps. The display image shows a series of genetic and 
physical maps published from a variety of sources, roughly aligned, with diagonal lines 
connecting common features. 

Another search system is SRS (Sequence Retrieval System) which is a Web-based 
system for searching among multiple sequence databases supported by EMBL. The SRS cross- 
35 references sequence information from approximately^ other sequence databases includingones 
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that hold protein and nucleotide sequence information, 3D structure, disease and phenotype 
information, and functional information. The SRS search allows structured queries on one or 
more databases with common fields (e.g., ID, AccNumber, Description). SRS displays the 
results as a series of hypertext links. The search can be broadened to other databases by bringing 

5 in cross-references. 

A number of patents exist which relate to display and analysis of biological sequence 
data, including, U.S. Patent Nos. 5891632; 5884230; 5878373; 5873052; 5873052; 5864488; 
5856928; 5842151; 5799301; 5795716; 5724605; 5724253; 5706498; 5701256; 5600826; 
5598350; 5595877; 5577249; 5557535; 5524240; 5453937; 5187775; 4939666; 4923808; 

10 4771384; and 4704692; and PCT Patent No. WO96/23078;all of which are incorporated herein 
by reference. 

E. Graphical User Interfaces (GUIs) 

The development and proliferation of GUIs has greatly enhanced the ease with which 
users interact with biological databases both in the searching stage and in the display of 

1 5 information. A conventional GUI display includes a desktop metaphor upon which one or more 
icons, application windows, or other graphical objects are displayed. Typically, a data 
processing system user interacts with a GUI display utilizing a graphical pointer, which the user 
controls with a graphical pointing device, such as a mouse, trackball, or joystick. For example, 
depending upon the actions allowed by the active application or operating system software, the 

20 user can select icons or other graphical objects within the GUI display by positioning the 
graphical pointer over the graphical object and depressing a button associated with the graphical 
pointing device. In addition, the user can typically relocate icons, application windows, and 
other graphical objects on the desktop utilizing the well known drag-and-drop techniques. By 
manipulating the graphical objects within the GUI display, the user can control the underlying 

25 hardware devices and software objects represented by the graphical objects in a graphical and 
intuitive manner. 

User interfaces used with multi-taskingprocessorsalso allow the user to simultaneously 
work on many tasks at once, each task being confined to its own display window. The interface 
allows the presentation of multiple windows in potentially overlapping relationshipson a display 
30 screen. The user can thus retain a window on the screen while temporarily superimposing a 
further window entirely or partially overlapping the retained window. This enables the user to 
divert the attention from a First window to one or more secondary windows for assistance and 
/or references, so that overall user interaction may be improved. There may be many windows 
with active applications running at once. Oftentimes, the windows may be (dynamically or 
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statically) related such that modifying a query in one window results in changes to the displayed 

data in the other related windows, thereby "propagating" the changes throughout. 

There are a number of patents which relate to Graphical User interfaces. For example, 

the following patents which relate to Graphical User Interfaces, albeit not for biological data, 
5 are incorporated by reference herein: U.S. Patent No. 5,926,806 to Marshall et al.; U.S. Patent 

No. 5,544,352to Egger; U.S. PatentNo. 5,777,616 to Bates et al.; U.S. Patent No. 5,812,804 to 

Bates et al.; U.S. PatentNo. 5,146,556to HullotetaL; U.S. PatentNo. 5,893,082 to McCormick; 

U.S. Patent No. 5,815,151 to Argioias; U.S. Patent No. 5,911,138 to Li; U.S. Patent No. 

5,761,656 to Ben-Shachar; U.S. PatentNo. 5,404,442 to Foster et al.; U.S. PatentNo. 5,917,492 
10 to Bereiter et al.; U.S. Patent No. 4.710,763 to Franke et al.; U.S. Patent No. 5,828,376 to 

Solimene et al.; U.S. Patent No. 5,748,927 to Stein et al.; U.S. Patent No. 5,452,416 to Hilton 

et al.; and 5,721,900 to Banning. 

F. Graphical User Interfaces for Biological Data Systems 

As in most industries, software user interfaces for biological data have evolved from the 
1 5 former DOS text and command line interfaces to intuitive screen graphics which represent data 
in a user friendly manner. In order to evaluate and analyze data sequences from various 
biological databases, researchers often utilize graphical user interfaces to view biological data 
in a variety of ways, including multiple sequence alignments (MSAs), secondary structure 
predictions, two-dimensional graphical representations of sequences, and phylogenetic trees. 
20 A multiple sequence alignment displays the alignment of homologous residues among 

a set of sequences in columns. In a 2D graphical representation, sequences are displayed as 
schematic boxes wherein each box is spatially oriented. Phylogenetic trees are genealogical 
trees which are built up with information gained from the comparison of the amino acid 
sequences in a protein. The phylogenetic tree (rooted or unrooted) is a graphical representation 
25 of the evolutionary distance between individual protein sequences in a family of proteins. The 
branches of the phylogenetic tree are evolutionary distances from the PAM matrix, an 
evolutionary model that assumes that estimation of mutation rates for closely related proteins 
can be extrapolated to distant relationships. 

A good example of a graphical user interface can be found in the ProDom interface. The 
30 output from a ProDom query for proteins sharing a homologous domain with a particular 
sequence may be displayed as 2D graphic representations, summarized alignments and trees, 
alignment in MSF format, and 3D structures. Specifically, the 2D graphical view presents 
domain arrangements for proteins sharing homology by showing each protein on a single line, 
starting with its name, hypertext- 1 inked to SWISS-PROT, followed by a 2D view of schematic 
35 boxes, each box hypertext- linked to corresponding ProDom entries. 
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The limitation of most of these systems is that the graphical displays are both static and 
unrelated. A static graphical display is defined as when a user is unable to refine or modify the 
search criteria from within the graphical display. Unrelated graphical displays are defined as 
when a user modifies a graphical display for a particular search, the remaining graphical displays 
5 for the particular search are not correspondingly modified (i.e., no propagation). These 
limitations can make the analysis of protein sequences cumbersome and time consuming. 
Additionally, the inflexibility of these system could result in an incorrect or incomplete analysis 
by limiting a user's ability to view all possible relationships. 

Accordingly, there is a need in the art for a user friendly computerized research tool for 

1 0 biological data to provide more effective ways to retrieve and view interrelated information from 
a database. This system needs to provide a usable display for representing vast amounts of 
discrete information, permitting researchers to focus on the most relevant materials and discover 
new functional relationships. To effectually and efficiently analyze the information, there 
remains a need for a graphical user interface which provides increased flexibility by permitting 

15 the user to view any number of related or unrelated data displays from one or more databases 
at the same time. These displays need to be interlinked such that the selection of one or more 
entries in one of the display windows causes the other display windows to distinguishably 
display and act on those entries related to the selection. Progressive querying is also needed to 
allow the user to quickly discover new relationships based on the results of previous queries. 

20 The present invention is*designed to address these needs. — 

DISCLOSURE OF THE INVENTION 
Broadly speaking, the invention is a computer research tool for searching and displaying 
biological data. Specifically, the invention provides a computer research tool for performing 
25 computerized research of biological data from various databases and for providing a novel 
graphical user interface that significantly enhances biological data representation, progressive 
querying and cross-navigation of windows and databases. 

The invention can be implemented in numerous ways, including as a system, a device, 
a method, or a computer readable medium. Several embodiments of the invention are discussed 
30 below. 

As a computer system, an embodiment of the invention includes a database containing 
tables of data, a display device and a processor unit. The display device has a plurality of 
display areas (windows). The processor unit operates to access the database to retrieve the data 
from the corresponding associated tables and then display the retrieved data in the display areas. 
3 5 The processor unit also detects when a selection associated with one of the display areas is made 
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and thereafter automatically modifies the data being displayed in the other display areas in 
accordance with the selection. Selection is made in a graphically distinct manner. Changes in 
certain selections, including scale and limits, propagate throughout. 

As a graphical user interface (GUI) for a display screen of a computer, an embodiment 
5 of the invention includes a number of display areas ("windows") for searching and displaying 
biological data which are interlinked for ease of navigation. A variety of formats for searching 
and displaying biological data is provided. Searches can be performed by keyword, sequence 
listing or family identifier (module ID). Sequence search results are graphically displayed 
showing the relationship between the probe sequence chosen for the query and each of the 
1 0 families that are related with their associated modules. Keyword search results are graphically 
displayed showing all sequences having the requested keyword along with all modules for the 
currently selected catalog. The module of interest may then be selected which results in a 
summary window for the family associated with the module. The family summary window 
display provides a two dimensional spatial orientation of the biological data, including visual 
1 5 locations of modules (represented as schematic boxes) in each of the sequences for a selected 
family distinguishably displayed and positionally aligned as well as the location of all other 
modules in those sequences. Another results display provides the user with the associated 
multiple sequence alignment of biological data and secondary structure predictions (Vparse, 
Score, PredSI, PredSec). A further results display provides the user with the associated 
20 phylogenetic tree (rooted or unrooted). A further display provides the actual protein sequence 
information for any selected member of a family. 

As a method of displaying data on a display device of a computer system, the data being 
obtained from a relational database associated with the computer system, the display haying 
"windowing" capability to prov ide a plurality of display areas, an embodiment of the invention 
25 includes the operations of: interlinking display areas via the database such that selections may 
be propagated throughout. The method further includes multiple catalog views, browsing 
through families, cross-navigation and propagation through all catalogs and sequences, . 
propagation through families, assigning protein function, and scaling of silent/express mutation 
ratios. 

30 As a computer readable media containing program instructions for displaying data on 

a display device of a computer system, the data being obtained from a relational database 
associated with the computer system, the display having "windowing" capability to provide a 
plurality of display areas, an embodiment of the invention includes: computer readable code 
devices for interlinking display areas via the database such that selections may be propagated 

35 throughout, multiplecatalog views, browsingthrough families, cross-navigation and propagation 
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through all catalogs and sequences, propagation through families, assigning protein function, and 

scaling of silent/express mutation ratios (kA/kS). 

The advantages of the invention are numerous. One significant advantage of the 

invention is that it allows a user to directly use the data returned by one or more queries as the 
5 basis for making additional queries. By this kind of interactive and progressive query-making 

activity, access to all of the information on a given topic is possible. As a result, new data 

connections and relationships may be discovered. The user is able to more efficiently and 

effectively review related biological information than conventionally possible. 

All patents, patent applications, provisional applications, and publications referred to 
10 or cited herein, or from which a claim for benefit of priority has been made, are incorporated 

herein by reference in their entirety to the extent they are not inconsistent with the explicit 

teachings of this specification. 

BRIEF DESCRIPTION OF THE DRAWINGS 

15 

FIG. 1 is an overview of the preferred embodiment of the hardware architecture for 
computerized searching of biological data. 

FIG. 2 depicts a classic three tier client server model utilized in a preferred embodiment 
of the present invention ( 1 -tier/Database Server, 2-tier/Application Server, 3-tier/Client). 
20 FIG" 3 depicts the preferred client/server communication model with all three tiers. 

FIG. 4A depicts the entity relationship diagram of one embodiment of the database. 

FIG. 4B depicts the entity relationship diagram of another embodiment of the database. 

FIG. 4C depicts the entity relationship diagram for DNA. 

FIG. 5 depicts the block diagram for user related database. 
25 FIG. 6 is a navigational flowchart of a preferred embodimentof the invention illustrating 

all major windows and options available. 

FIG. 7 depicts a sample screen display for the catalog selection window. 

FIG. 8 depicts a sample screen display for the search by name window. 

FIG. 9 depicts a sample screen display for the Module Family Summary (MFS) window. 
30 FIG. 10 depicts a sample screen display for the search by sequence window. 

FIG. 1 1 depicts a sample screen display for the sequence search results (SSR) window. 

FIG. 12 depicts a sample screen display for the search by keyword window. 

FIG. 13 depicts a sample screen display for the keyword search results (KSR) window. 

FIG. 14 depicts a sample screen display for the MSA window. 
35 FIG. 15 depicts a sample screen display for the evolutionary tree window. 
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FIG. 16 depicts an example of consecutive screen displays for interactive and 
progressive query-making activity from search by sequence. 

FIG. 17 depicts an example of consecutive screen displays for interactive and 
progressive query-making activity from search by keyword. 
5 FIG. 1 8 A depicts the MFS window for one catalog. 

FIG. 18B depicts the MFS window with a second catalog selected and consecutive 
screen displays for interactive and progressive query-making activity from this window. 

FIG. 1 9 depicts the screen display for the evolutionary tree window as linked to the MFS 
window and MSA window with highlighted selections propagated throughout. 
! o ■ FIG. 20 depicts the screen displays showing interactive and progressive query-making 

activity across multiple MFS windows. 

DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENT 

1 5 Referring now to the drawings, the preferred embodiment of the present invention will 

be described. 

1 A R CHTTFCTU RE 

Figure I is an overview of the preferred embodiment of the hardware architecture for 
computerized searching of biological data. The architecture preferably comprises at least two 

20 networked cbmputerprocessors (client component and server component(s)) and a databases) 
for storing biological data. The computer processors can be processors that are typically found 
in personal desktop computers (e.g., IBM, Dell, Macintosh), portable computers, mainframes, 
minicomputers, or other computing devices. Preferably in the networked client/server 
architecture of the present invention, a classic three tier client server model is utilized as shown 

25 in Figure 2 ( 1 -tier/Database Server, 2-tier/Application Server, 3-tier/Client). Preferably, a 
relational database management system (RDMS), either as part of the Application Server 
component or as a separate component (RDB machine) provides the interface to the database. 

In a preferred database-centric client/server architecture, the client application generally 
30 requests data and data-related services from the application server which makes requests to the 
database server. The servers) (e.g., either as part of the application server machine or a separate 
RDB/relationaidatabasemachine)respondstotheclient'srequests and provides secured access 

to shared data. 

II. CLIENT 
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More specifically, the client components are preferably complete, stand-alone personal 
computers offering a full range of power and features to run applications. The client component 
preferably operates under any operating system and includes communication means, input 
means, storage means, and display means. The user enters input commands into the computer 

5 processor through input means which could comprise a keyboard, mouse, or both. Alternatively, 
the input means could comprise any device used to transfer information or commands. The 
display comprises a computer monitor, television, LCD, LED, or any other means to convey 
information to the user. In a preferred embodiment, the user interface is a graphical user 
interface (GUI) written and operating under the Java programming language (Sun Microsystems) 

10 as a Java compatible browser or Java Virtual Machine (JVM). The GUI provides flexible 
navigational tools to explore patterns in the evolutionary relationships between genomic 
sequences. The clients and the Application server communicate via Java's RMI (Remote Method 
Invocation). 

III. SERVER 

1 5 The server component(s) can be a personal computer, a minicomputer, or a mainframe 

and offers data management, information sharing between clients, network administration and 
security. The Database Server (RDBMS - Relational Database Management System) and the 
Application Server may be the same machine or different hosts if desired. The Application 
Server is preferably a Java application (JDK Ver. 1.1 or JRE) running on a supported UNIX 
"* 20 platform (e.g., Linux, Irix, Solans). The Database Server is preferably SQL-capable (e;g., 
MySQL, Oracle). The Application Server and Database Server communicate via the protocol 
implied by the JDBC (Java Database Connectivity) driver of the RDBMS. The Application 
Server preferably completely isolates the client from any notion of relational databases; the 
client's view is one of (Java) objects, not relations. 

25 The present invention also envisions other computing arrangements for the client and 

server(s), including processing on a single machine such as a mainframe, a collection of 
machines, or other suitable means. 

IV. CLIENT/SERVER COMMUNICATIONS 

The client and server machines work together to accomplish the processing of the 
30 present invention. The preferable protocol between the client and server is RMI (Remote Method 
Invocation for Java-to-Java communications across Virtual Machines). RMI is a standard 
defined by the Java Core. 

The isolation of clients from each other requires that each client gets its own server 
instance as a container of all client related data like database connection or query status. A root 
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server allows connection bootstrapping by creating server instances. The resulting 
communication model with all three tiers is depicted in Figure 3. 

V DATABASE HA RDWARE 

The database is preferably connected to the database server component and can be any 
device which will hold data. For example, the database can consist of any type of magnetic or 
optical storing device for a computer (e.g., CDROM, internal hard drive, tape drive). The 
database can be located remote to the server component (with access via modem or leased line) 
or locally to the server component. 



10 



VT DATABAS E TVPF (RELATIONAL) 

The database is preferablya relational database created/derived from existing biological 
data sets and/or databases (e.g., SwissProt,GeneBank) that is organized and accessed according 
to relationships between data items. In a preferred embodiment, the database is SQL compatible 
15 with standard JDBC supported mechanisms and datatypes. The relational database would 
preferably consist of a plurality of tables (entities). The rows of a table represent records 
(collections of information about separate items) and the columns represent fields (particular 
attributes of a record). In its simplest conception, the relational database is a collection of data 
entries that "relate" to each other through at least one common field. 
20 In a preferred embodiment, for example, portions of the database ma>Tbe organized by 

identifying families of homologous protein sequences within the database, constructing for each 
family a multiple sequence alignment, an evolutionary tree, and ancestral sequences at nodes in 
the tree, constructing a corresponding multiple alignment for the DNA sequences that encode 
the proteins in the protein family, assigning silent and expressed mutations in the DNA 
25 sequences to each branch of the DNA evolutionary tree a secondary structure is predicted for the 
family, and this predicted secondary structure is aligned with the ancestral sequence at the root 
of the tree. 

The predicted structural models and their corresponding models of ancestral sequences 
may be used to organize the protein sequence database to provide rapid search and retrieval of 

30 sequence databases. In this case, to apply the models of secondary structure predicted using the 
methods disclosed in U.S. Patent No. 5,958,784, incorporated herein by reference, the predicted 
models are set within the evolutionary history of the protein family. The evolutionary history 
is defined by a multiple alignment of the sequences of members of the protein family, an 
evolutionary tree connecting these members, and ancestral sequences reconstructed in 

35 probabilistic form throughout the tree. 
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In the present invention, a multiple alignment, an evolutionary tree, and ancestral 
sequences at nodes in the tree can be constructed by methods well known in the art for a set of 
homologous proteins. These three elements of the description are interlocking,as is well known 
in the art. The presently preferred methods of constructing ancestral sequences for a given tree 
5 is the maximum parsimony methods, as implemented (for example) in the commercially 
available program MacClade [W.P. Maddison, D.R. Maddison, MacClade, Analysis of 
Phytogeny and Character Evolution, Sinauer Associates, Sunderland MA (1992)]. Trees are 
compared based on their scores using either maximum parsimony or maximum likelihood 
criteria, and selected based on considerations of score and correspondence to known facts. 
10 Next, a corresponding multiple alignment is constructed by methods well known in the 

art for the DNA sequences that encode the proteins in the protein family. The multiple 
alignment is constructed in parallel with the protein alignment. In regions of gaps or 
ambiguities, the amino acid sequence alignment can be adjusted to give the alignment with the 
most parsimonious DNA tree. The presently preferred method of constructing ancestral DNA 
1 5 sequences for a given tree is the maximum parsimony method. The DNA and protein trees and 
multiple alignments must be congruent, meaning that when amino acids are aligned in the 
protein alignment, the corresponding codons are aligned in the DNA alignment. Likewise, the 
connectivity of the two evolutionary trees must show the same evolutionary relationships. In 
regions where the connectivity of the amino acid tree is not uniquely defined by the amino acid 
20 sequences, the tree that gives the most parsimonious DNA tree is used to decide between two 
trees or reconstructionsof equal value. Finally, the ancestral amino acids reconstructed at nodes 
in the tree must correspond to the reconstructed codons at those nodes. When the ancestral 
sequences are ambiguous, and where the DNA sequences cannot resolve the ambiguity, the 
reconstructed DNA sequences must be ambiguous in parallel. Approximate reconstructions are 
25 valuable even when exact reconstructions are not possible from available data, and the tree is 
preferably constrained to correspond to evolutionary relationships between proteins inferred 
from biological data (e.g., cladistics). 

Next, mutations in the DNA sequences are then assigned to each branch of the DNA 
evolutionary tree. These may be fractional mutations to reflect ambiguities in the sequences at 
30 the nodes of the tree. When ambiguities are encountered, alternatives are weighted equally. 
Mutations along each branch are then assigned as being "silent," meaning that they do not have 
an impact on the encoded protein sequence, and "expressed," meaning that they do have an 
impact on the encoded protein sequence. Fractional assignments are made in the case of 
ambiguities in the reconstructed sequences at nodes in a tree. 
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Thereafter, intermediates in the evolutionary tree are then prepared in the laboratory 
using protein engineering and biotechnology methods well known in the art [Jermann, T.M., 
Opitz, J.G., Stackhouse, J., Benner, S. A. "Reconstructing the evolutionary history of the 
artiodactyl ribonuclease superfamily," Nature 374:57-59 (1995)]. 
5 The method disclosed in U.S. Patent No. 5,958,784 can then be applied to each protein 

family. For each protein family, a secondary structure is predicted for the family, and this 
predicted secondary structure is aligned with the ancestral sequence at the root of the tree. If the 
root of the tree is unassigned, the predicted secondary structure is aligned with the ancestral 
sequence calculated for an arbitrary point near the center of gravity of the tree. 
10 As the quality of a multiple alignment and the precision of the reconstructed ancestral 

sequences decreases if proteins are included in the family with sequences diverging by over 150 
PAM units, where a PAM unit is the number of point accepted mutations per 1 00 amino acids, 
while the quality of the secondary structure prediction determined by the methods disclosed in 
U.S. Patent No. 5,958,784 becomes worse if the family does not contain at least some protein 
15 sequence pairs 40 PAM units or more divergent, families used in this invention preferably 
contain at least some protein sequence pairs more than 40 PAM units divergent, but contain no 
protein pairs more than 1 50 PAM units divergent. Most preferably, a majority of protein pairs 
are 40 or more PAM units divergent and no protein pair is more than 120 PAM units divergent. 
The sequences in a protein family are, however, generally determined by the availability of 
20 sequences in the database. 

Once the models for secondary structure predicted by the methods disclosed in U.S. 
PatentNo. 5,958.784 are placed into their evolutionary context as described above, the context 
can be used in the following ways: 

1 . Rapidly searchable database 
25 The above-noted steps provide one method to organize the protein sequence database 

in a rapidly searchable form. The ancestral sequences and the predicted secondary structures 
associated with the families defined by these steps are surrogates for the sequences and 
structures of the individual proteins that are members of the family. The reconstructed ancestral 
sequence represents in a single sequence all of the sequences of the descendent proteins. The 
30 predicted secondary structure associated with the ancestral sequence represents in a single 
structural model all of the core secondary structural elements of the descendent proteins. Thus, 
the ancestral sequences can replace the descendent sequences, and the corresponding core 
secondary structural models can replace the secondary structures of the descendent proteins. 

This makes it possible to define two surrogate databases, one for the sequences, the other 
35 for secondary structures. The first surrogate database is the database that collects from each of 
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the familiesof proteins in the databases a single ancestral sequence, at the point in the tree that 
most accurately approximates the root of the tree. If the root cannot be determined, the ancestral 
sequence chosen for the surrogate sequence database is near the center of mass of the tree. The 
second surrogate database is a database of the corresponding secondary structural elements. The 

5 surrogate databases are much smaller than the complete databases that contain the actual 
sequences or actual structures for each protein in the family, as each ancestral sequence 
represents many descendent proteins. 

Searching the surrogate databases for homo logs of a probe sequence generally proceeds 
in two steps. First, the probe sequence (or structure) is matched against the database of surrogate 

10 sequences (or structures). Should the search yield a significant match, the probe sequence is 
identified as a member of one of the families already defined. The probe sequence is then 
matched with the members of this family to determine where it fits within the evolutionary tree 
defined by the family. The multiple alignment, evolutionary tree, predicted secondary structure 
and reconstructed ancestral sequences may be different once the new probe sequence is 

15 incorporated into the family. If so, the different multiple alignment, evolutionary tree, and 
predicted secondary structure are recorded, and the modified reconstructed ancestral sequence 
and structure are incorporated into their respective surrogate databases for future use. 

Alignment of ancestral sequences with ancestral sequences has an advantage in detecting 
longer distance homology, as the ancestral sequences contain information about what amino acid 

20 residues are conserved within the nuclear family, and therefore are more likely to be conserved 
between diverging nuclear families. 
VII. DATA B ASE TABLES 

Each separate table (entity) represents a different schema of interconnections between 
some or all of the protein sequences from the underlying biological data set/ database. 

25 Specifically, in a preferred embodiment, the relational database for storing biological data 
includes a plurality of interrelated tables wherein each table comprises an attribute having a 
common domain with an attribute of at least one other table in the database. The invention 
provides for viewing patterns in the evolutionary relationships between genomic sequences on 
the basis of the data stored in the relational database. 

30 Three versions of this schema can be viewed using the entity relationship diagrams of 

Figures 4A - 4C which illustrates the fields for each type of data and the interconnections 
between data types. The following is a detailed description of a collection of tables (entities) in 
one embodiment of the present invention with the attributes and keys of each relation. The ID 
field of all relations except those of the SeqAnnType table is assigned automatically when 

35 inserting the relation. In particular, Figure 4C shows the entity relationship diagram for DNA. 
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/. AASequence Table 

The AASequence table contains all amino acid sequences available in the database. 





# 


Name 


Type 


Key 


Description 


5 


I 


Id 


INTEGER 


P 


The internal id, i.e. the foreign key 
for all modules, sequence annotations 
and sequence keys belonging to this 
sequence. 


10 


2 


SeqDBId 


INTEGER 




The sequence database the sequence 
belongs to. 




3 


Description 


VARCHAR (255) 




A description of the sequence. 


15 


4 


Sequence 


LONGVARCHAR 




The sequence itself as encoded 
characters according to 
Alphabet.getAminoAcidAlphabet ( ). 



20 



25 



30 



35 



40 



2. Catalog Table 

The Catalog table contains all catalogs available in this database. 



Name 



Id 

Name 

Description 

Version 

DBVersion 

ProfilePAM 
NrModules 
NrFamilies 



Type 



Key 



INTEGER 

VARCHAR (64) 
VARCHAR (255) 
VARCHAR (16) 

rNTEGER 

float [ ] 
rNTEGER 



INTEGER 



Description 



The internal id, i.e. the foreign key 
for all families belonging to this 
catalog. 

The unique name for external 
identification. 

A description of the catalog's 
contents. 

The version of this catalog (a date or 

a version according to some 

numbering scheme). 

The database format version of this 

catalog. The format version is used 

to verity the compatibility of 

database and MC server. 

The list of PAM of the available 

profile databases. It caches the result 

of a time consuming query. 

The total number of modules in this 

catalog. It caches the result of a time 

consuming query. 

The total number of families in this 

catalog. It caches the result of a time 

consuming query. 
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J. FamAnnotation Table 

The FamAnnotation table contains all annotationsof all families. An annotation always 
belongs to exactly one family. 



u 


Name 


Type 


Key 


Description 


1 


Id 


INTEGER 


P 


The internal id. 


2 


Famld 


INTEGER 


F 


Foreign key to the corresponding 










family. 


3 


Annotation 


ds. MSAAnnotation 




The annotation of the MSA of the 










family. 



10 

4. Family Table 

The Family table contains all families of all catalogs. A family always belongs to 
exactly one catalog and contains an arbitrary number of modules. 



20 



25 



u 


Name 


Type 


Key 


Description 


1 


Id 


INTEGER 


P 


The internal id, i.e. the foreign key 






for all modules, profiles and MSA 
annotations belonging to this family. 


2 


Name 


VARCHAR (64) 


S 


The name of this family assigned 
during modularization. 


3 


Modification 


INTEGER 


S 


The modification number of this | 
family inside its once assigned name. 


4 


Catld 


INTEGER 




Foreign key to the containing 


5 


Description 


VARCHAR (255) 




catalog. The description of this 








family (usually generated 
automatically during 


6 


Tree 


ds. Tree 




modularization). The 
precalculated phylogenetic tree of 
this family. 



J. Module Table 

The Module table contains all modules of all catalogs. A module always belongs to 
exactly one sequence and exactly one family. The multiple sequence alignment is implicitly 
stored as a Gaps structure from which the MSA can be directly constructed. 
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# 


Name 


Type 


Key 




1 

2 


Id 

Seqld 


INTEGER 
INTEGER 


P 

r 


5 


3 


SeqStart 


INTEGER 






4 


SeqLength 


INTEGER 




10 


5 


Famld 


INTEGER 


F 


6 


Fam Index 


INTEGER 






7 


StartS ignificance 


REAL 




15 


8 


EndSignificance 


REAL 






9 
10 


Quality 
MSAGaps 


REAL 
ds. Gaps 





Description 



The internal id. 

Foreign key to the containing 

sequence. 

Start of this module in the 
containing sequence (o based). 
Length of this module in the 
containing sequence. 
Foreign key to the containing 
family. 

Index of this module in the 
containing family. 
Significance of start point of this 
module. 

Significance of end point of this 
module. 

Overall quality of this module. 
Gaps to be inserted in this 
module's sequence to obtain the 
MSA sequence. 



20 



6. Profile Table 

The Profile table containsall profilesof all families. A profilealways belongs to exactly 
one family. For each family, there may be several profiles at different PAM. 





# • 


Name 


Type 


Key 


Description 


25 
30 


1 

2 
3 
4 


Famld 
Pam 

Sequence 


INTEGER 
INTEGER 

REAL 

BLOB 


P 
F 

S 


The internal id. 

Foreign key to corresponding 

family. 

PAM distance from the PAS of the 
family. 

The actual profile sequence in 
packed representation (byte [ ] ). 



WO 01/20535 



PCT/USOO/25247 



19 

7. SeqAnnotation Table 

The SeqAnnotationtable containsall annotationsof all sequences in this database. Each 
annotation belongs to exactly one sequence. 



5 


# 


Name 


Type 


Key 


Description 




I 


Id 


INTEGER 


P 


The internal id. 




2 


Seqld 


INTEGER 


F 


Foreign key to the containing 










sequence. 


10 


3 


SeqStart 


INTEGER 




Start of this annotation in the 
containing sequence (0 based). 


4 


SeqLength 


INTEGER 




Length of this annotation in the 










containing sequence. 




5 


Type 


INTEGER 


F 


The sequence annotation type as 










defined by SeqAnnType. 


15 


6 


Desscription 


VARCHAR (255) 




A description of the annotation in 








addition to the one provided by 
SeqAnnType. it may be missing, 
i.e. NULL. 



8. SeqAnnType Table 

The SeqAnnType table contains all types of all sequence annotations in this database. 
Its main purpose is to provide standard descriptions for each type to be displayed in the GUI. 
20 This entity has more the character of a lookup table than of a real database entity. The Id 
attribute ofeach relation is assigned manually. The semantics of some IDs must be known to 
e.g. the GUL as different annotations will be displayed differently. Types are collected into 
groups (type id ranges) whose members normally do not overlap (e.g. secondary structure or 
binding sites). The groups are predefined in mastercatalog.ds.SeqAnnGroup. 



30 



# 


Name 


Type 


Key 


Description 


1 


Id 


INTEGER 


P 


The internal id. 


2 


Tag 


VARCHAR (32) 


S 


The tag used in the original 








annotation database corresponding 










to this type (e.g. a feature table 










tag). 


3 


Description 


VARCHAR (255) 




A description of the annotation. 



9. SequenceDB Table 

35 The SequenceDB table contains all sequence databases available in this database. 
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5 



15 



# 


Name 


Type 


Key 


Description 


1 


Id 


INTEGER 


P 


The internal id, i.e. the foreign key 






for all sequences belonging to this 
database and all catalogs built from 
this database. 


2 


Name 




s 


The unique name for external 








identification. 


3 


Description 


VARCHAR (255) 




A description of the database 








contents. 


4 


Version 


VARCHAR(16) 




The version of this database ( a 








date or a version according to some 
numbering scheme). 


5 


Search Keys 


String [ ] 




The list of available search keys for 
sequence searches. It caches the 
result of a time consuming query. 


6 


SeqAnnTypes 


String [ ] 




The list of available sequence 
annotations. It caches the result of 
a time consuming query. 
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10. SequenceKey Table 

The SequenceKey table contains all indexed keys of a sequence like ID, accession 
numbers, EC numbers and so on. Storing these keys separately allows additional key types to 
be added without modifying the database structure or any code. Furthermore, there may be 
5 multiple occurrences of the same key type for any sequence. 



# 


Name 


Type 


Key 


Description 


1 


Id 


INTEGER 


P 


The internal id. 


2 


Key Type 


VARCHAR(16) 




The key type, e.g. "EC". 


3 


KeyValue 


VARCHAR (255) 


S 


The key value, e.g. "4.2. LI". 


4 


Seqld 


INTEGER 


F 


Foreign key to corresponding 










sequence. 



Short arrays and graphs are preferably stored as BLOBs (Binary Large Objects) to 
prevent uncontrol led growth of the number of entities in the design. Only large arrays of variable 
size are stored in their own relation by properly normalizingthe database. The mapping between 
15 transient Java objects and persistent database relations (rows in a DB table) is based on a unique 
"Id" (an INTEGER) for each relation. References to other objects in the database are therefore 
always INTEGER foreign keys. 

For each table, there is a subclass whose instances are Java objects corresponding to 
relations, and a subclass representing the corresponding entity and providing the necessary SQL 
20 code. 

VIII. GUI 

The operation of the present invention will now be described with respect to the 
graphical user interface and its navigation of the database. 

The graphical user interface of the present invention allows the user to browse the 
25 sequence database, perform searches, and examine evolutionary relationships. For example, the 
GUI is a browser that can be used to follow evolutionary relationships through the genomic 
sequence; the browsing provides interactive trees, multiple sequence alignments, and families; 
the database of families can be searched using a novel method that represents each family as a 
probabilistic sequence; rates of evolution are displayed on evolutionary trees and provide 
30 evidence of changes in function. 

As in most "windows" applications, each display screens of the invention generally 
comprise a window title bar, a menu bar (with command such as File, Edit, and the like), a tool 
bar (with options such as Close, Paste, Clear, and the like), and an information display region. 
The information display region may, for example, display a query window or a results window. 
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Figure 6 is a navigational flowchart of a preferred embodiment of the invention 
illustrating all major windows and options available. Rectangles represent the windows and 
circles represent options available within their respective windows. Arrows indicate direction 
[and each layer of the program is color codes]. 
5 Preferably, the initial window is a LOGIN window (not shown) wherein a user may 

enter a valid user name and password to access the system. 

Upon a successful login, the CATALOG SELECTION WTNDOW 100 appears in the 
user's display as shown in a pictorial diagram in Figure 7 according to an embodiment of the 
invention. The CATALOG SELECTION WINDOW 100 displays the available catalogs for 
1 0 * selection by name, version, description, number of families and number of modules. 

Each catalog is constructed to provide a view of relationships between (some or all of) 
the protein sequences in the database. Different catalogs emphasize different features of the 
protein sequence database. For example, one catalog might emphasize repeat units within 
proteins, another catalog focuses on alignments which comprise the whole length of genes (e.g., 
15 a gene product catalog), and another focuses on local patterns of divergence between protein 
sequences (e.g., a modularized catalog). Catalogs are composed of families of modules, each 
module defining a region of a protein sequence. Thus, the families relate regions of different 
protein sequences in biologically meaningful ways. 

In the database configuration, the Catalog table (described previously) contains the 
20 following: Id, Name, Description, Source, Version, DBVersion, ProfilePAM, NrModules, 
NRFamilies, MinFamld, MaxFamld, SearchKeys, SeqAnnTypes, RefSeqDBs. 

Examples of catalogs which may be included in the present invention include the 
following: 

1. Modularized catalog : A descriptionof relationships between proteins sequences based 
25 on local patterns of divergence (according to a model of evolution). There are many ways in 

which such a catalog might be constructed. Published examples include ProDom and DOMO. 

2. Gene product catalog : A description between protein sequences one or more of which 
must comprise whole genes. Example of a method for building a catalog of this type can be seen 
in Monica Riley's work on Modularization of the E.coli genome. 

30 3. Repeat catalog : A description between regions of proteins which form identifiable 

repeats of at least 20 residues. Certain features of the DOMO database show features of this 
type, although this database is more properly defined as a modularized catalog. 

4. Entry catalog : A description defining the 'classical' method of sequence database 
construction. That is that there are no explicit relationships between different sequence entries. 

35 It is simply a dictionary of all available protein sequences in the database. 
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Catalogs can be subsets of all data in the database. Typically this is most (scientifically) 
useful for the modularized catalogs where focusing on a subset of ail the gene products (e.g. just 
mammalian sequences) is biologically meaningful (e.g., mammalian modularized catalog, 
bacterial modularized catalog). 
5 Selecting a catalog by double clicking on one of the available catalogs displays a query 

window for that catalog. Alternatively, you can select a catalog entry with the mouse and click 
on the Open button on the toolbar. 

There is a special catalog, 'the Sequence Entry Catalog' that contains just the individual 
protein sequences. This catalog necessarily does not have family names or allow sequence 
1 0 searching; it merely provides direct access to individual protein sequences and can be searched 
by keyword. 

Once a catalog is chosen, there are several different ways to search the catalog: a) by 
catalog family identifier, b) by keyword, and c) by sequence. In a preferred embodiment, the 
query type is selected using "tabs" associated with each search type. Each search method yields 
15 unique results providing a rich mechanism for navigation to related modules. All of these 
methods may not be available for every catalog (i.e., if there is no family, there would be no 
family identifier). The methods are described in more detail hereafter. 

a) . Fi gure 8 - Search bv Name (Family identifier/Module ID >: Each family in a catalog 
preferably has a unique identifier that defines a particular family. Preferably, these family 

20 names'are generated automatically when each catalog is built from the protein sequence data. 
This search window HO allows you to obtain a specific family number of interest as shown in 
Figure 8. The result of this search is the Module Family Summary (MFS)window 140 of Figure 
9 showing a graphical view of the associated proteins and their modules. 

b) . Figure 1 0 - Search bv Sequence : This window 120 allows you to search a protein against 
25 a catalog for homologous (evolutionary related) protein families as shown in Figure 10. 

Homology is an important concept in extracting information from sequence databases because 
conclusions can be drawn about the chemical behavior and biological function when two 
proteins are homologous. One way to determine whether two proteins are homologous is to 
compare their amino acid sequences. Procedures are well established in the art for comparing 

30 two protein sequences, scoring similarities, and using this score to assess the likelihood that the 
similarities arose by reason of common ancestry rather than by random chance [Gonnet, G.H., 
Cohen, M.A., Benner,S.A. Exhaustive matching of the entire protein sequence database. Science 
256, 1443-1445 (1992)]. You can enter the amino acid of interest, specify a minimum scope and 
maximum amount of matches, adjust the PAM sensitivity (the sensitivity of the search cen be 

35 set, for example, at 50 PAM to look for close homologs or 150 PAM to detect long distance 
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10 



15 



homology. Only the hits above the specified minimum score and up to the maximum amount 
of matches are reported.). This search can take a while for large probe sequences. You can 
abort the search at any time with the cancel button. 

The result of this search is a Sequence Search Results (SSR) window 125 showing a 
graphical view of the relationship between the probe sequence and the associated families as 
shown in Figure 11. This window shows the relationship between the probe sequence chosen 
for the query and each of the families that are related. The score corresponds to a 'log odds 
score*, the probability that the relationship between the sequence is related according to the 
model of evolution vs. the probability that the.similarity is by chance. 

Double clicking on any module shown in SSR window will display the module family 
summary (MFS) window 140 for that family. The alignment of the probe sequence with the 
summary can then be displayed. 

C ). Ejgure 12 - Search hv Keyword : This window 130 provides searches of the protein 
sequence database by keyword, according to annotations of proteins in the original sequence 
database as shown in Figure 12. Keywords provided include selection by organism, by 
classification, by gene name and by gene product description. 

The result of this search is a keyword search results (KSR) window 135, Figure 13, 
includes a list of database sequence ID's which have database sequence annotations matching 
the keywords in the description. The display shows a graphical view of the individual protein 
sequences which fit the keyword search criteria. The graphical view is shown as a linear 
arrangementof schematic boxes (spatially oriented) representing the existing modules found in 
the selected catalog along the identified amino acid range (AA). Each of the schematic boxes 
is preferably identified with its correspondingModuie ID number and is differentiated by color 
or another form of distinctive representation. In some cases, sequences may match with a 
25 particular keyword but no modularization information is available (and no schematic boxes 
shown) because these sequences were not part of the set included in the currently selected 
catalog. 

Double clicking on any module shown in the KSR window 135 will display the module 
family summary (MFS) window 140 for the associated family. 

30 The Module Family Summary (MFS)window 140 of Figure 9 showing a graphical view 

of the associated proteins and their modules will now be described in detail . This window is the 
gateway to navigational power of the present invention by providing a gateway to other displays. 
The window shows all of the sequences in the currently selected catalog that are members of a 
particular family, where the family contains ail of the sequences which have a particular module 

35 of interest. All relationships between modules have been precalcuiated in the database. The 
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module is a subsequence that is a member of family where the graphical length is proportionate. 
Unidentified regions do not have schematic boxes. The module of interest is preferably visibly 
distinguished from the other modules and its ID is identified in the title bar of the window. The 
sequences in the family may also contain other modules. The sequences are ordered to cluster 
5 modules which are closest together in evolutionary distance. 

The window is preferably tabular with a separate numbered row for each sequence. The 
- columns preferably include the sequence ID, a description, the amino acid range and a graphical 
view of the sequence shown as a linear arrangement of schematic boxes (spatially oriented) 
representing the existing modules found in the selected catalog along the identified amino acid 
10 range (AA). Each of the schematic boxes is preferably identified with its corresponding Module 
ID number and is differentiated by color or another form of distinctive representation. The 
schematic boxes for the currently selected module are vertically aligned and visually 
distinguished, such as by color (red). The windows "tool tips" feature may expand any truncated 
descriptions or provide additional information in a floating window when the pointer is placed 
15 over a particular table entry. This window provides a good indication of any long distance 
homology between various modules. Proteins that share a common module frequently possess 
other homologous modules at analogous positions. Such relationships can be confirmed by 
examining multiple sequence alignments and trees. 

The toolbar of the MFS window allows you to perform many different tasks from this 
- 20 point (e.g., print, export to disk, display multiple sequence alignment (MSA); and display 
phylogenetictree). Each of these tasks will now be discussed in detail. The family summary can 
be printed directly to a printer available from a local computer. The description of the 
modularization of this family can be exported to file. 

We now turn to Figure 14 for the display the multiple sequence alignment(MSA) of the 
25 current family. Selecting the MSA button on the window's toolbar shows the multiple sequence 
alignment (the way in which the modules are related at the amino acid level) in the MSA 
window 150 of Figure 1 4. This window provides detailed evolutionary information at the protein 
sequence level to following pattern of conservation and variation of amino acid composition of 
the module. The MSA is preferably colored according to hydrophobic or hydrophilic nature of 
30 the amino acids, (e.g., RED indicates hydrophobic propensity and BLUE indicates hydrophil lie). 
The numbering system used in the MSA window preferably corresponds to the number system 
implemented by both the MFS window and the Tree window. Highlighted on each sequence are 
annotated regions of individual protein sequences. Moving the cursor (pointer) over a chosen 
highlighted region displays the annotation in a floating window. These can be hand-crafted 
35 comments, such as feature table entries from SwissProt or automatically generated from patterns 
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such as those in PROSITE (or others such as PRINTS). Different annotations can be selected 
from the option bar at the bottom of the MSA window. 

Analyzing correlations in the patterns of substitution in the sequences for each module 
family allows predictions to be made about the nature of underlying structural or functional 
constraints. Preferably, the annotations provides are VParse, showing the location of putative 
structure breaking residues or motifs; Score, showing the degree of conservation* each position 
in the alignment. This value is dependent on both the evolutionary distances between the 
sequences and the mutability of the individual amino acids, and is a sensitive indicator of 
significance of conserved sites; PredSl, indicating the predicted solvent accessibility of the 
residueat that positioned PredSec. indicating the predicted secondary structure. If the PAM 
width of the family is poor or the number of sequences is small, then there may be insufficient 
information for a secondary structure prediction. If the given module aligns significantly with 
any entry in the PDB indicating a confidant homology, a string of secondary structural elements 
correspondingtothatalignmentcanbeseenatthebottomoftheMSAwindowbelowPredSIand 

1 5 PredSec strings. 

Aspreviouslydescribed,themultiplesequencealignmentwindowshowstheamino-acid 

by amino-acid relationship between proteins which are in the same family. Some preferred 
features include a) Coloring: Hydrophobic residues in red, hydrophilic residues in blue, 
amphiphilic residues in black, b) Parses: Regions of sequence which are likely to represent 
20 secondary structure breaking positions, are indicated, c) PredSl: Predicted surface/interior 
residues are indicated, d) PredSec: Predicted secondary structure is shown, e) Experimental Sec: 
If known, the secondary structure of experimentally determined homologs to the family, are 
shown, f)Annotation: Sequence features, such as motifs of well known function can be selected 
from the 'annotations:' list box in the window (see below). Functions of the MSA window 
25 include: a) Export sequence data: The sequence of selected modules of this family can be 
exported to file in a variety of formats, b)Copy sequence data to clipboard: The sequence of 
selected modules of this family is copied to the clipboard when this button is selected, c) Print: 
The family summary can be printed directly to any printer available from your local computer, 
d)Annotations:The sequences in theMSA can be highlighted for particular annotations (usually 
30 specific sequence motifs or special database annotations). One such collection of annotations 
is the 'Prosite' database. Regions of sequence corresponding to Prosite annotations are colored 
in orange. Movingthe cursor over that region of the text displays the details of the annotation 
in a floating window. The complete set of annotations visible in the current window can be 
obtained by looking at the 'Annotation Types' menu. A subset of all the annotations in the 
35 current collection can be obtained by customizing with the tickboxes in the annotation menu. 
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Selection of sequences/modules will now be described. You can select some or all of 
the sequences of individual modules. Sequences are selected individually in the MSA display 
with single mouse clicks. To combine your selection with previous selections, keep the <CTRL> 
key depressed while selecting. In addition to the toolbar tasks, other features are available from 
5 the window. For example, double clicking on any sequence in the ID column shows the protein 
sequence window for that family (including catalog membership, description, and annotations). 

We now turn to Figure 15 for the display the evolutionary tree of the current family. 
Selecting the Tree button on the window's shows the evolutionary tree in the Tree window 160 
of Figure 15. This indicates the pattern of divergence/similarity between individual modules, 
1 0 assumingthat the distance between modules can be computed from the similarity in the protein 
sequences. Specifically, trees show an estimate of the evolutionary history of a protein module, 
constructed using the PAM distances between individual members of that family. Trees may 
be displayed either as rooted or unrooted form; there is no significant distinction between these 
representations, the location of the root being chosen to balance the tree. 
1 5 On the branches of the tree, the length of the branches are displayed in PAM units. This 

provides an estimate of divergence in composition of the various sequences. Selecting the 
"kA/kS" key at the bottom of the window will display the ratio of the rate of expressed changes 
at the DNA level to the rate of silent changes, i.e., the rate of mutation leading to changes at the 
protein level calibrated against the rate of mutation leading to changes only at the DNA level. 
20 The rates are preferably normalized so that when reading expressedrsilentratios, a value 

of around 1.0 indicates no selection, both synonymous (silent changes) and non-synonymous 
(expressed changes) substitutions being equally likely (e.g., a pseudogene). The threshold level 
of kA/kS is user adjustable on the slide scale. Separate coloring schemes are preferably used to 
indicate branches above or below the threshold. For example, if kA/kS is less than the threshold 
25 (default 1 .0) the branch is colored blue and if kAJ kS is greater than the threshold, the branch is 
colored red. If no DNA sequence information is available, then the branch is colored black. In 
practice, proteins are normally under the influence of purifying selection so the ratios fall well 
below this value. Therefore, where the ratio approaches or exceeds 1 .0, the confidence that one 
is looking at an episode of rapid sequence evolution (presumably to new function) increases. 
30 For adaptations which occurred at longer evolutionary times, the value of the expressed:silent 
ratio will appear lower as random mutation has increased the number of silent changes. A 
suitable threshold value can be determined by examining the tree as a whole, which will contain 
branches that have maintained purifying selection for longer periods and comparing these values 
with those that suggest mare rapid changes at expressed sites. 
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The graphic interface also offers scaling facilityto zoom in and zoom out using the slide 
scale at the bottom of the window. Zooming may be necessary in order to see the PAM distance 
labels or leaf identifiers. This information can also be seen in a floating window by positioning 
the cursor over the leaves or branches. Individual branches can be displayed on separate trees 
5 by selecting the appropriate branches and the "Zoom" key. The "fit" button sizes the tree to the 
full window size. 

As previously described, the Tree window shows the evolutionary relationship between 
individual modules of a family, using distance calculated from a comparison of their amino acid 
sequences. Some preferred features include: a) Coloring: Blue edge - KaKs (see below) below 
10 threshold. Red edge - KaKs above threshold, Black edge - KaKs not computed (no 
DNA/unreliable DNA sequence). The threshold is selected by changing a slide bar on the 
toolbar of the Tree window, b) Rooted/Unrooted: The tree can be displayed in rooted, or in 
unrooted form, depending on user preference. There is no difference in information content 
between these two descriptions; the root of the tree is chosen for balance, not as a result of other 
1 5 phy logenetic evidence. Some preferred function include: a) Export tree description: The tree can 
be exported to file in a variety of formats, b) Print: The tree can be printed directly to any printer 
available from your local computer, c) Annotations: The sequences in the MSA can be 
highlighted for particular annotations (usually specific sequence motifs or special database 
annotations). One such collection of annotations is the 'Prosite' database. Regions of sequence 
20 corresponding to Prosite annotations are colored in orange. Moving the cursor over that region 
of the text displays the details of the annotation in a floating window. The complete set of 
annotations visible in the current window can be obtained by looking at the 'Annotation Types' 
menu. A subset of all the annotations in the current collection can be obtained by customizing 
with the tickboxes in the annotation menu; d) Selection sequences/modules: You can select 
25 branches or leaves of the tree with single clicks of the mouse. Selecting a branch will result in 
all of the branches and leaves being selected downstream of the root. (The root is marked with 
a circle). To combine your selection with previous selections, keep the <CTRL> key depressed 
while selecting; e) Fit tree/Rescaletree: The fit button can be used to rescale the tree to fit into 
the entire window. Pressing SHIFT and the left mouse button can be used to zoom in toward 
30 the selected portion of the window, and SHIFT with the right mouse button will zoom out from 
the selected portion of the window; f) Ka/Ks: You can display either KaKs or PAM distances 
on the tree. 

The MFS window, tree window and MSA window are linked so that selections in one 
window highlight sequences in the other. You can select some or all of the modules possessed 
35 by individual sequences. Modules are selected individually in the MFS window with single 
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mouse clicks. All of the modules possessed by a protein can be selected at once by selecting the 
module id (#) on the left hand column. To combine your selection with previous selections, keep 
the <CTRJL> key depressed while selecting. 

Normally selections propagate through displays of the current family, i.e., selecting 

5 sequences in any window that applies to a family highlights that module in every display for that 
family - the Tree window, the MSA window and the MFS window. If the propagate check box 
is set, selections propagate by sequence - i.e., all proteins sequences possessing any selected 
modules are highlighted. This mode is extremely useful when tracing relationships across 
different protein families. 

10 We will now describe navigation to another family (related by membership with 

sequences in the current family). Double clicking on any module in the graphical representation 
displays the family summary window corresponding to that module. 

Notable features of the present invention include a) multi-catalog views where a user 
can simultaneously view more then one catalog, b) tree to tree interactivity where active 

15 selections from a current window get propagated throughout (selected and deleted), c) 
connectivity of selections where a section of a tree as selected will highlight associated 
information in other windows which is continuouslyappliedas windows are opened, d) the MSA 
window's Prosite annotations, tool tips to show annotations, and customization of annotation 
display /view to select subsets. 

20 " ' Appendix A [20 pages] contains source code for certain of the functions of the present 
invention written in Java, including IndexSelection.java for managing and propagating 
selections, FamilyFrame.java for showing the data about a family as a table, 
FamilyTableModel.java for representation of family information with sequences and their 
modules, IndexSelectionListener.java for as an interface for a class which listens to index 

25 selection changes, and FamilySequenceRenderer.java for rendering family sequences wherein 
single clicking on a module selects it and double clicking triggers opening of the corresponding 
family frame. 

Following are examples which illustrate procedures for practicing the invention. These 
30 examples should not be construed as limiting. 

Example 1 - 

FIG. 16 depicts an example of consecutive screen displays for interactive and 
progressive query-making activity from search by sequence. First a sequence is typed or cut and 
35 pasted into the query box 120. Minimum scope, maximum matches and PAM are adjusted by 
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the user as desired. The query is then run resulting in the Sequence Search Results (SSR) 
window 125. From this window, the user can progressively query the desired module(s) by 
double-clicking on that module(s) which results in the MFS window 140 with the module of 
interest designated in RED and positionally aligned with all other modules. 
5 Example 2 — 

FIG. 17 depicts an example of consecutive screen displays for interactive and 
progressive query-making activity from search by keyword. The search term "isocitrate" is 
entered (other search terms may be included with the necessary boolean logic) in the query 
window 130. After the query is run, the Keyword Search Results (KSR) window 135 is 

1 0 displayed listing all sequences which contain the queried keyword with the graphical display of 
the sequences and their modules. From this window, the user may select a module of interest to 
be displayed in the MFS window 140. The MFS window 140 shows the two-dimensional spatial 
orientation of the biological data, including visual locations of modules (represented as 
schematic boxes) in each of the sequences for a selected family distinguishably displayed and 

15 positionally aligned as well as the location of all other modules in those sequences. 
Example 3 " 

FIG. 1 8 A depicts the MFS window 140 for one catalog. From the menu bar, additional 
catalog views may be selected. In this example, both the genomes and OPgenomes catalogs are 
chosen to be viewed as shown in FIG. 18B. This view depicts the MFS window with two 
20 catalogs. From this window the user can begin progressive query-making activity by selecting 
modules of interest and viewing them in the MFS windows. 
Example 4 -- 

FIG. 1 9 depicts the screen display for the evolutionary tree window 160 as linked to the 
MFS window 140 and MSA window 150 with highlighted selections propagated throughout. 
25 Highlighting a section of the tree will automatically propagate the highlighting to the other 
windows (MSA and MFS) for the selected sequences. 
F,xamp|e 5 

FIG. 20 depicts the screen displays showing interactive and progressive query-making 
activity across multiple MFS windows 140. Beginning, for example, with the 358J module 
30 MFSwindow,theusercanselectthe377_l for display, the 978_1 for display, and the 371 J for 
display (simultaneously). The windows can be closed or moved about the screen as desired. 
Thereafter, the user can continue to select displays from the resulting windows, e.g., the user can 
select the 1075_l module from the 978 J MFS window for display, and so on. Progressive 
querying can be continued through level upon level. Of course, from each MFS window 140, the 
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MSA 150 of Figure 14, evolutionary tree 160 of Figure 15, or database entry (not shown) can 
be displayed as depicted in the flowchart of Figure 6. 

It should be understood that the examples and embodiments described herein are for 
illustrative purposes only and that various modifications or changes in light thereof will be 
5 suggested to persons skilled in the art and are to be included within the spirit and purview of this 
application and the scope of the appended claims. All patents, patent applications, provisional 
applications, and publications referred to or cited herein, or for which a claim for benefit of 
priority has been made, are incorporated by reference in their entirety to the extent they are not 
inconsistent with the explicit teachings of this specification. 
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CLAIMS 

What is claimed is: 



1 I. A method of searching and displaying biological data such that patterns in the 

2 evolutionary relationships between genomic sequences can be explored, comprising the 

3 following steps: 

4 (a) selecting at least one catalog, wherein the catalog comprises an organized body of related 

5 biological data; 

6 (b) searching the catalog using a probe sequence to obtain a listing of search results 

7 displayed in graphical form which shows the relationship between the probe sequence and each 

8 of the modules that are evolutionarily related to the probe sequence, wherein a module is a 

9 region of a protein sequence; 

10 (c) selecting a module of interest from the search results listing; and 

1 1 (d) displayinga family which comprises a set of all sequences having the selected module 

12 of interest wherein each sequence of the set includes a corresponding graphical representation 

13 of the various modules of the sequence along its amino acid range. 

1 2. The method of claim I further comprising the step of displaying a multiple sequence 

2 alignment, evolutionary tree or database entry for the family. 



1 3. The method of claim 1 wherein the graphical representation comprises a two-dimensional 

2 spatially oriented view for each sequence of the set wherein the modules for each sequence are 

3 represented as schematic boxes and wherein the modules for each sequence are sequentially 

4 ordered along its amino acid range. 

1 4. The method of claim 3 wherein like modules for each sequence of the family are 

2 positionally aligned with other like modules and visibly distinguished. 

1 5. The method of claim I further comprising the step of selecting a new module of interest 

2 from the graphical representation of the various modules and displaying a different family which 

3 comprises a set of all sequences having the new selected module of interest. 

1 6. The method of claim 5 wherein each sequence of the set includes a corresponding 

2 graphical representation of the various modules of the sequence along its amino acid range. 

3 
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1 7. The method of claim 1 further comprising the step of adding at least one more catalog 

2 so that the graphical display of the family includes a corresponding graphical representation of 

3 the various modules of the sequence for each catalog. 

1 8. The method of claim 7 further comprising the step of selecting a new module of interest 

2 from the graphical representation of the various modules from any of the catalogs and 

3 displaying, for that catalog, a different family which comprises a set of all sequences having the 

4 new selected module of interest. 

1 9: The method of claim 2 wherein in the multiple sequence alignment shows the amino-acid 

2 by amino-acid relationship between proteins which are in the same family. 

1 10. The method of claim 9 wherein the hydrophobic, hydrophi lie, and amphiphilic residues 

2 are visually distinguished. 

1 11. The method of claim 9 wherein the regions of sequence which are likely to represent 

2 secondary structure breaking positions are visually indicated. 

1 12. The method of claim 9 wherein the predicated surface/interior residues are visually 

2 indicated. 

I 13. The method of claim 9 wherein the predicted secondary structure is shown. 

1 14. The method of claim 9 wherein the secondary structure of experimentally determined 

2 homologs to the family are shown if known. 

1 15. The method of 2 wherein the evolutionary tree is displayed to indicate the pattern of 

2 divergence and similarity between individual modules . 

1 16. The method of claim 15 wherein the evolutionary tree is constructed using PAM 

2 distances between individual members of the family. 

1 1 7. The method of claim 1 6 wherein the ratio of expressed changes at the DNA level to the 

2 rate of silent changes is displayed. 
3 
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1 18. The method of claim 17 wherein separate coloring schemes are used to indicate 

2 branches above or below a user selectable threshold for the ratio. 

1 1 9. The method of claim 2 wherein the display of the graphical representation of the various 

2 modules of the sequence of a family, the display of the multiple sequence alignment, and the 

3 display of the evolutionary tree are related such that a user selection in any of these displays is 

4 propagated through the other displays. 

\ 20. A method of searching and displaying biological data such that patterns in the 

2 evolutionary relationships between genomic sequences can be explored, comprising the 

3 following steps: 

4 (a) selectingat least one catalog, wherein the catalog comprises an organized body of related 

5 biological data; 

6 (b) searching the catalog by keyword to obtain a listing of search results displayed in 

7 graphical form which shows individual protein sequences which match the keyword description, 

8 wherein specific regions of each protein sequence are visibly distinguished as modules; 

9 (c) selecting a module of interest from the search results listing; and 

1 0 (d) displaying a family which comprises a set of all sequences having the selected module 

1 1 of interest wherein each sequence of the set includes a corresponding graphical representation 

12 of the various modules of the sequence along its amino acid range. 

1 21 . A computer system for searching and displaying biological data such that patterns in 

2 the evolutionary relationships between genomic sequences can be explored, comprising: 

3 input means for selecting at least one catalog, wherein the catalog comprises an organized 

4 body of related biological data; 

5 processing means for searching the catalog using a probe sequence to obtain a listing of 

6 search results displayed in graphical form which shows the relationship between the probe 

7 sequence and each of the modules that are evolutionarily related to the probe sequence, wherein 

8 a module is a region of a protein sequence; 

9 input means for selecting a module of interest from the search results listing; and 

10 display means for displaying a family which comprises a set of all sequences having the 

1 1 selected module of interest wherein each sequence of the set includes a corresponding graphical 

12 representation of the various modules of the sequence along its amino acid range. 
13 
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1 22. A graphical user interface for searching and displaying biological data such that patterns 

2 in the evolutionary relationships between genomic sequences can be explored, comprising: 

3 a first display area for selecting at least one catalog, wherein the catalog comprises an 

4 organized body of related biological data; 

5 a second display area for searching the catalog using a probe sequence; 

6 a third display area which provides a listing of search results displayed in graphical form 

7 which shows the relationship between the probe sequence and each of the modules that are 

8 evolutionary related to the probe sequence, wherein a module is a region of a protein sequence, 

9 and wherein a module of interest from the search results listing can be selected; and 

10 a fourth display area for displaying a family which comprises a set of all sequences having 

1 1 the selected module of interest wherein each sequence of the set includes a corresponding 

12 graphical representation of the various modules of the sequence along its amino acid range. 

1 23. A graphical user interface for displaying biological data such that patterns in the 

2 evolutionary relationships between genomic sequences can be explored, comprising: 

3 a display area for displaying a family which comprises a set of all sequences having a 

4 selected module of interest wherein each sequence of the set includes a corresponding graphical 

5 representation of the various modules of the sequence along its amino acid range and wherein 

6 the graphical representation comprises a two-dimensional spatially oriented view for each 

7 sequence of the set wherein the modules for each sequence are represented as schematic boxes 

8 and wherein the modules for each sequence are sequentially ordered along its amino acid range. 

1 24. The graphical user interface of claim 23 wherein like modules for each sequence of the 

2 family are positionally aligned with other like modules and visibly distinguished. 

1 25. A computer readable media containing program instructions for displaying data on a 

2 display device of a computer system, the data being obtained from tables in a database 

3 associated with the computer system, said computer readable media comprising: 

4 First computerprogram code for selecting at least one catalog, wherein the catalog comprises 

5 an organized body of related biological data; 

6 second computer program code for searching the catalog using a probe sequence to obtain 

7 a listing of search results displayed in graphical form which shows the relationship between the 

8 probe sequence and each of the modules that are evolutionarily related to the probe sequence, 

9 wherein a module is a region of a protein sequence; 
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10 third computer program code for selecting a module of interest from the search results 

I I listing; and 

12 fourth computer program code for displaying a family which comprises a set of all 

13 sequences having the selected module of interest wherein each sequence of the set includes a 

14 correspondinggraphical representation of the various modules of the sequence along its amino 

1 5 acid range. 



1 26. Acomputerizedstorageandretrievalsystemofbiologicalinformationcomprisingadata 

2 ■ storage means for storing data in a relational database wherein the database comprises tables, 

3 each table hav ing a domain of at least one attribute in common with at least one other table, said 

4 tables comprising: 

5 at least one table for storing all amino acid sequences available in the database; 

6 at least one table for storing all catalogs available in the database; 

7 at least one table for storing all annotations of ail families; 

8 at least one table for storing all families of all catalogs; 

9 at least one table for storing all modules of all catalogs; 

10 at least one table for storing all profiles of all families; 

1 1 at least one table for storing all annotations of all sequences in the database; 

12 at least one table for storing all types of sequence annotations in the database; 

13 at least one table for storing all sequence databases available in the database; and 

14 at least one table for storing all indexed keys of a sequence. 

1 27. A computer system for storing and retrieving biological data comprising: 

2 a relational database for storing biological data comprising a plurality of interrelated tables 
wherein each table comprises an attribute having a common domain with an attribute of at least 

4 one other table in the database; and 

5 means for viewing patterns in the evolutionary relationships between genomic sequences 

6 on the basis of the data stored in the relational database. 



1 28. The computer system of claim 27 wherein the database comprises tables, wherein the 

2 database comprises tables, said tables comprising: 

3 at least one table for storing all amino acid sequences available in the database; 

4 at least one table for storing ail catalogs available in the database; 

5 at least one table for storing all annotations of all families; 

6 at least one table for storing all families of all catalogs; 
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7 at least one table for storing all modules of all catalogs; 

8 at least one table for storing all profiles of all families; 

9 at least one table for storing all annotations of all sequences in the database; 

10 at least one table for storing all types of sequence annotations in the database; 

1 1 at least one table for storing all sequence databases available in the database; and 

12 at least one table for storing all indexed keys of a sequence. 

1 29. A computer system for storing and retrieving biological data comprising: 

2 a database comprising tables wherein said biological information is stored such that the 

3 tables are interrelated by having at least one common attribute; 

4 means for viewing patterns in the evolutionary relationships between genomic sequences 

5 on the basis of the data stored in the database. 

1 30. A method of graphically representing on a display device information about long 

2 distance homology between numerous modules, each specific module comprising a common 

3 subsequence, the method comprising the steps of: 

4 selecting a module of interest; and 

5 displaying a set of all proteins in a database possessing said module of interest, each protein 

6 in the set having a graphical view of its modules wherein the selected module of interest and any 
"7 other homologous modules at analogous positions are visually distinguished.* 

1 31. In a database organized by identifying families of homologous protein sequences within 

2 the database, constructing for each family a multiple sequence alignment, an evolutionary tree, 

3 and ancestral sequences at nodes in the tree, constructing a corresponding multiple alignment 

4 for the DNA sequences that encode the proteins in the protein family, assigning silent and 

5 expressed mutations in the DNA sequences to each branch of the DNA evolutionary tree, a 

6 method of graphically representing on a display device information about long distance 

7 homology between numerous modules, each specific module comprising a common 

8 subsequence, the method comprising the steps of: 

9 selecting a module of interest; and 

1 0 displaying a set of all proteins in a database possessing said module of interest, each 

1 1 protein in the set having a graphical view of its modules wherein the selected module of 

12 interest and any other homologous modules at analogous positions are visually 

13 distinguished. 
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Catalog Window 

This window lists the available catalogs. 

dTkJ^VL?* 3 V 'l W ° f re ! a i! onshi P s between < s °™ °r all 00 the protein sequences in the database. 
D.flferent catalogs emphas.ze different features of the protein sequence database. (For example one 
catalog might emphasize repeat units within proteins, another catalog focuses on alignments which 

£ ng,h ° f 8en " (thC gCne Pr0dUCl Catal ° 8) - and ano,hcr focus « « loc.7p.SS., of 

tZTT h !T f se " uene « <* c modularized ««log). Catalogs are composed of families of 
modules, each module defining a region of a protein sequence. Thus, the families relate regions of 
difterent protein sequences in biologically meaningfijl ways. 

Double clicking on one of the available catalogs displays a query window for that catalog Alternatively 
you can select a catalog entry with the mouse and click on the Open button on the toolbar A " erna " Ve,y - 

~n!L~f xi? * 5PCC , ial T lal0S ' ' the Se " uence En *y Cata 'og' that conta.ns just the individual protein 
sequences. Th.s catalog does no, have family names or allow you ,o search sequences (see below it 
provdes direct access to mdividual protein sequences and can be searched by keyword 
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Query Window 

I • Search by Family identifier 

allows you ,o obtain a specific family numb^ ofSt^W, ..T™* ^ ™ S Scarch w,nd ° w 
show,„g a gn.phica, view of the associaced ZS^JfjEg** Search is < h < *-»ly window 
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Family window 

The family window is die gateway to navigational power of MasterCatalog. The window shows ail of the 
sequences in the cuiTently selected catalog that are members of the that family. It shows the location of the 
modules in each of the sequences for that family (in red), and it also shows the location of ail other modules 
in those sequence. 

You can perform many different tasks from this point: 



1 . Display the multiple sequence alignment of the current family 

Selecting the MSA button on the window's toolbar shows the multiple sequence alignment (the way in 
which the modules are related at the amino acid level) 

2. Display the evolutionary tree of the current family 

Selecting the Tree button on the window's shows the evolutionary tree. This indicates the pattern of 
divergence/similarity between individual modules, assuming that the distance between modules can be 
computed from the similarity in the protein sequences. 

3. Display protein sequence information for any member of family % 

Double clicking on any sequence in the ID column shows the protein sequence window for that family 
(including catalog membership, description, and annotations). 

4. Select sequences/modules 

You can select some or all of the modules possessed by individual sequences. Modules are selected 
individually in the summary display with single mouse clicks. AU of the modules possessed by a protein 
can be sel ecte d at once by selecting the module id (?) on the left hand column. 
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2. Search by Sequence 

This window allows you to search a protein against a catalog for homologous (evolutionarily related) 
protein families. The result of this search is a sequence search summary window showing a graphical view 
of the relationship berween the probe sequence and the associated families. 

Note: This search can take as long as minutes for large probe sequences. You can abort the search at any 
time with the cancel burton. 



WO 01/20535 



PCT/USOO/25247 



13/23 




FIGURE 1 1 



I. Sequence search results 

This window shows the relationship berween the probe sequence chosen for the query and each of the 
families that are related. The score corresponds to a 'log odds score', the probability that the relationship 
between the sequence is related according to the model of evolution vs the probability that the similarity is 
by chance. ......... . -- - 

Double clicking on any module shown in this summary will display the summary window for that family. 
The alignment of the probe sequence with the summary can then be displayed. 
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3. Search by Keyword 

This window provides searches of the protein sequence database by keyword, according to annotations of 
proteins in the original sequence database. Keywords provided include selection by organism, by 
classification, by gene name and by gene product description. The result of this search is a sequence 
summary window showing a graphical view of individual protein sequences which fit the keyword search 
criteria. 



Search results windows 

These windows summarize the search results for search by sequence and search by keyword respectively. 
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2. Keyword search results 

The keyword search results window shows the sequences. For each sequence all modules are shown for 
the currently selected catalog. Note: In some cases, sequences may match with a particular keyword but 
no modularizaiion information is available because these sequences were not pan of the set included in th 
currently selected catalog. 

Double clicking on any module shown in the search results will display the summary window for the 
associated family. 
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FIGURE 18A 




WO 01/20535 



PCTAJSO0/25247 



21/23 








9 




to 


csi.too 


t1 


•CM.11 27 


12 




13 




14 




15 




16 


mj»n_l«* 


17 


•CQJ.215 


1 1 


MJ0S2 


19 


•co»,joai 


10 


rmjo_1l3 


21 


•cat.2641 








•m«iw_.. 
nwtf t429 



\40 




FIGURE 18B 



WO 01/20535 



PCT/US00/25247 




FIGURE 19 



WO Ot/20535 



PCT/US00/25247 




WO 01/20535 



1 / 21 



PCT/USO0/25247 



APPENDIX A 

© 1999 EraGen Biosciences, Inc. 



Saliwanchik, Lloyd & Saliwanchik, P.A 
2421 N. W. 41 s ' Street, Suite A- 1 
Gainesville, FL 32606 
(352) 375-8100 



WO 01/20535 



2 / 21 



PCT/USOO/25247 



package m.is i e r r nc.-i i eg . u: i 1 ; 

import ; ava 1 . • ; 

/*• 

The class managing and propagating selections of indices per id. Global 
propagation is possible through maps of indices to global ids. 
Gau-r.cr Lukas Knecht 

* / 



Sid: IndexSelection . java. v 1.5 1999/08/19 11:25:46 knechc Exp S 

SLog: IndexSelect ion . j a va . v S 

Revision 1.5 1999/08/19 11:25:46 knecht 

Added hasSelected ( ) and getSubsetName ( ) . 

Revision 1.4 1999/07/30 08:43:13 knecht 
Added global propagation of selections. 

Revision 1.3 1999/07/19 08:30:09 knecht 
Added multiple toggle. 

Revision I. . 2 1999/04/26 13:13:00 knecht 
Modifications for V2 . 0 

Revision i.l 1999/02/18 21:42:16 knecht 
Initial revision 



public clasa IndexSelection { 

private static Hashtabie ~ : = new HashtableU; 

private static boolean ; . •:•*; ■• . - 

private BicSet = ; 

private boolean /. true if anything changed since tne last propagated 

private Vector . .■ - •■ • ;; 

private intll . . ; maps index :o ids for global propagation 



Create a new empty IndexSelection with a given id. 

*• the id uniquely identifying this selection. 

protected IndexSelection I m'c . ) n 

selected'= new BitSet (f; changed = ; listeners = new vector!); 

selectionTable . put ( new Integer(id). this); 

) 



Obtain the index selection belonging to a given id. This is the only 
method to create an IndexSelection. 

@t:,ir.tr. : . .■ the id uniquely identifying the selection. If no IndexSelection 
with this id exists, it is created. 

<?:=:t:jrr. the IndexSelection belonging to the given id. 

public static IndexSelection obtainlndexSel ec ti on ( in t : .:) ( 
IndexSelection . z:z: rr - getlndexSelection ( id) ; 
if (selection == :;•.;;:) selection ■ new IndexSelection ( id) ; 
return selection; 



Get index selection belonging to a given id. 

Ccar.-im to. the id uniquely identifying the selection. 

0 recur:: the IndexSelection belonging to the given id or null if no such 
IndexSelection exists. 
• / 

public static IndexSelection getlndexSel ect ion ( int .:'.) { 

return ( IndexSelection) selectionTable. get (new Integer ( id) ) ; 



Enable or display global propagation of selections. 

9i.-ir.it:; jn.inivd true if global propagation should be enabled, false otherwise. 
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PU a^Lfo aCiC V ° id setGlobalp ropagacion (boolean on,-:-) { 
globalPropagacion * enabled; *-••«-.... » i 



Sji-™ f C3tUS ,° f 9 w° bal pr °P a ^tion of selections. 
^ 5 " rn true enabled, false otherwise. 

public static boolean getClobal Propagation ( , ( return globalPropaga tion; , 
Sec the mapping from indices to global ids for nr^K^i 

the selection. If global propagation is eLLlL t^l^ 0 ^ 9 ^^ ° f 

• c-.r*.-3 the mapping Set tine it IS i 5 PX ff exlStin 5 selections. 

from this IndexSelection S«ting it to null disables global propagation 

public void setlndexToId ( inc ( J .,-....----:) { 
this. indexTold = indexTold; 

if (globalPropagation indexTold '= -r* ) 

'"niK?^ : !?^^^ c « < » ' e . has«ore E l ements C ) ; , ( 
if (other =- this « other td^ld ' ; 
for line = 0; i < indexTold. length; i~) 

, nC - Z °L j < othe r. indexTold. length; j — , 
if UnaexToId(i) other . indexTold ( jT) 

if (other. isselectedt j) ) select(i); 
^ else deselect (i); 

) 

) 

Hi s^eclfon"' ^ 4ndie " " gl ° bal ids »1 — 1 propagation of 

9: the mapping. 



public intfj getlndexToIdO I return indexTold; ) 
Get the current selection. 

Jr^v:r::.the set.-of currently selected indices. " '* * ' 

public 9itSet getSelectedO ( return selected; , 



Remove a previously added listener 
B^arnn the listener to be removed. 



/ • 



f".}?" in ^ SX and " OCe any change. 

<^.i-an. index the index co be selected. 

public synchronized void seiectlint ,nc 1K | ( 

( if (I selected. get (index,, , selected. set ( index, ; changed = true, , 

/ » • 

Select the given indices and note any change 
>wrj " Che indices to be selected 
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public synchronized void select (intM . r.z:i z z* ) { 

for (inc ; « 0; i < indices . length; i*+) select ( indices ( i I ) ; 

J 



Deselect the given index and note any change. 
<?r;ornrr .nuex the index to be deselecced. 
• / 

public synchronized void deselect (inc i :int?x ) { 

if ( selected. gee ( index) J { selecced.clear (index) ; changed = ) 

) 



Deselect the given indices and note any change. 
Q ..•.■n .[-.-:•::-. the indices to be deselected. 

public synchronized void desel ect I in t ( 1 . n- ) ( 

cor (int =0; i < i ndi ces . 1 ength ; i+-) desel ect ( indi ces f i J > ; 

) 



Toggle the selection of the given index, optionally allowing multiple 
selections . 

<? ■: . . : - the index to be toggled. 

3\ allows tor multiple selections: if false and index is being 
selected, deselects all other indices. 

public synchronized vcic zoggletint .:■.-•?>.. boolean ;.r„: . ■ ) { 

if ( selected. get I index] ! selected. clear (index) ; 
else ( 

if < .'multiple) selected = new 2itSet(>; 
selected. set ( index) ; 

) 

changed = 



Toggle the selection of the given indices, optionally allowing multiple 
selections. 

'J-. : . « the indices to be toggled. 

9' .r. .r. v.. allows for multiple selections: if false and "any index is 

being selected, deselects all other indices. 

puclic synchronized void toggle ( int () .. boolean ) ( 

int . ; 

9itSet ■ . . v. = selected; 

i f ( 'multiple ) { 

for (i = 0; i < indices . length 4i sel ect ed . get ( indi ces ( i )) ; i J ; 
if (i < indices . length ) newSelecced = new 3itSet(); 

) 

for (i 9 0; i < indices . length; i -•■-•■) 

if ( selected . get ( indices ( i J ) ) newSelected . clear ( indices ( i j ) ; 

else newSelected .set(indiceslil); 
selected = newSelected; changed = 



Get the selection for a given index. 

0{;.ir?.::: it::: .hi. the index to be queried. 

3: et-.;r- true if index is selected, false otherwise. 

• / 

public synchronized boolean isSelected ( int :;k:*x) { 
return selected. get ( index) ; 

) 



Check whether there is any selection at all. 

©Tviu: r. true if any index is selected, false otherwise. 

* / 

public synchronized boolean hasSel ected ( ) ( 
int 
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i Hi™ : < ! s ^:« s :^i?r e " " !salecte -•■«">» '•-»« 

2SS"£ SS r Kt - ;i5USiSr" " S <^«^"eners if u 
?" rM Che li"ener which should not be notified. 

^chanledTT^ ^ P "" a «« e < ^SeleccionLiscener ^u:,-«, . 
changed = 

IndexSeleccionLiscener = 1 
if'u^sou^c^ 

n ii source) 1 .selectionChanged(this) ; 

^ propagateGlobally ( ) ; 

) 

^ooiu^'co"^:^ sel.ctfont: P — te 6h « ««""« »'eccio n scace 

-or (Enumeration = qp]orrinnT a ki „ i 

index-Selection •• , ^Indellel-cc onfe" ' ' I rV has ««^l ements , , ; , , 
*« .other this ** ot^r tnd^d ?= e :"r' t en,enC, ' : 

for (me = 0; i < i nH»vToT^ i ^ V *. 1 



or (:nc = 0; i < indexToId . 1 ength; i~, 

if UndexToidli] « other . indexToId?"?)' 
1 ("Selecced(i)> ocher.selecc [i ) 
else other .deselect ( j ) • 
^ other . propagate < -uj . .) . 



) 

I 



Get a simple standard name ,£or a BicS-t 
L..-...:- • r : the set. May be null. 

-: P " a (no " un ^*' name corresponding to the subset. 

PU ff 4 C.K!« C SCrlna 9ecSub **cNai*e(BitSet j ( 

it (subset == return ; ' 

int = i, = 0 . 

for (int. , 0; i < subset. size 
if (subset .get (i) ) ( 

if (mini > maxi) mini * i- 
maxi = i ; 

) 

, return * "ilni ♦ . maxi , 
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package .nas;2rc.ir. ; ilca.c;»jj ; 

i mpor t T.a st-i.-za r. a : eg . ?u : . 

i mpo r c :n«i * l .5 :' r. :\ *_ .1 L oc . :1s . • ; 

impo r c n>» is;; i ; . i ! .?.c . : c i i . 

import ; .v. v. . 

import : av.i . t . * ; 

import . a . • ■ r . ev„-n:. . • ; 

import ; ava 

i mpo r t . : dv . i :•■ . • v. i r r; . * ; 

import .v ..• . •-. : . a r . : • 



The standard frame showing the daca about a family as a table 
Sw--.:'.-: Lukas Knecht 



Sid: FamilyFrame. java . v 1.10 1999/08/19 11:25:45 knecht Exp S 
SLog: F ami ly Frame . java . v S 

Revision 1.10 1999/08/19 11:25:45 knecht 
Various GUI changes . 

Revision 1.9 1999.-08/10 12:55:11 knecnt 
Change for new busy indicator. 

Revision 1.3 1999/07/30 08:43:13 knecht 
Various changes and modi f icac ions . 

Revision i.~ 1999/07.'L9 09:44:10 knecnt 
No tree and nc MSA give error message. 

Revision 1.5 1999/07/19 08:16:20 knecht 
Various GUI changes. 

Revision 1 = I 99 9 '04, 2 6 10:13:19 knecht 
Modifications for V2 . 0 

Revision 1. A 1999/02/19 10:08:26 knecnt 
Added menus . 

Revision 1.2 1999/02,18 21:45:02 knecht 
Various big changes and enhancements for VI. 0 

Revision 1.2 1998/12/14 22:19:28 sgc 

Various bug fixes and enhancements, including the export function and MSA ordering 

Revision 1.1 1998/10/02 14:15:46 knecht 
Initial revision 

public class FamilyFrame extends MCFrame 

implements Exportable. Printable. IncexSei ect ionLi s tener . Cei lCli ckListene- { 
private static final int h'.'.'S '. ?.". _ •: r = 0 ; 
private static final int F'.F"'~S"}1\. :~" ? Hi' = 1; 

private static final Stringd -x:: z t.- = ( :■... }; 

private static FrameTable cpinf ranes = new FrameTabl e < ) ; 

private Family .'arail-/; 

private Bit Set jul-:*-ar. ; 

private String r;am*?; 

private FamilyTableModel acc:&i; 

private MCTabie -..ilri-; 

private IndexSelection .-=o : c r :. Km .- 

private BitSet cl cSol -ct cr ; 

private int I] rneintv-a r T :*o«.*r:ii. ; 

private Fami 1 /Sequence ( I .?>:.- r a socju-jkc r ; 

private JComboBox s :j: t:v».: ; // the list of available searches 

Create a frame showing a family belonging to a certain catalog. 
0;:.-.ra.-, r.o?v»r the ServerConnection to get the available catalogs from. 
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9p.-.r.an, :.uh q the cacalog the family belongs to 

euaran, -amlly the family to be shown ° eA ° ngS t0 * 

?r r " :C SllCS " C 3 BiCSSt id «ntifying Che subset. May be null. 

public FamilyFrame(ServerConnection .^r-.-r Familv < Jm< ' - o-.o 

throws Exception family .jm:.;. , BitSet sunf-t) 

super (server, • . ■ t.- ., !: ... , n , , : . • j . 

this. family = family; this. subset = subset; 

nlrne"! family g«Id ( < n ? exSele "ion ■ ^tSubsetName ( subset , ; 

Catalog ^J™" server get Cat £00^?^ °?f nFrames ' P« < name . this); 
eapTj ^ T tt/ - , ZrY ' ^ etCata io9ById < family. cat Id) ; 

secTule{fami y.gecFuUName{) - subsetName * ■ .. 

catalog, name * ,■ ) . 

memberToSequence = f ami 1 y . gecMemberToSequence ( ) ; 

'Listen to changes in the selection 
table = new MCTableO; 

JTextFielc s new JTexcf ielc( ) ; 

descr . setEdi table { :*..:.-) ; 

if (Capabilities. areAllToolTipsSupportedi ) ) 

descr. setToolTi pText { 
add {descr. ) ; ) ; 

add(new -TScrol 1 Pane { tabl e ) 

model = new FamilyTableModel (catalog/ family) • 
tabl e.setModel (model ) ; u> '' 
table. secContext (server, catalog) • 
tabae.setSelection(selection) ; 

addStandardMenultems (catalog) ; 

toolBar . addSeparator ( ) ; 

final SlowAction - c-- 

, ■ . t l "' = new SiowActionO ( 

public voic act() ( ShOwMSAO; } 

MCAbstractAction -i.- * = 
new MCAbstractAction! 

); PUbUC V ° id *«xon P erformed(ActionEven; » < msaAction. run ( , i ) ' ' 

addToToolBar(msa) ; addToVi ewMenu ( msa . ) ; 

final SlowAction ■ •• .-- - ~ , 

- llh ,. , = = new SlowAction* ) i 

public voia act!) { showTreeO; ) - ion *> < 

MCAbscractAct ion c .-?-? = 

new MCAbstractAction (" . • - , 

} public void actionPer formed ^ActionEvent'- ' ( treeAccion . run ( , ; ) 

McfH!J°° lBar(Cree,; a <*dToVi ewMenu (tree. , • 

MCAbstractAction tI = ' ' ' 

new MCAbstractAccion( . • . ; . . ■« . : - 

j *• 1 *" .' * ■ ■ '. ' " it.'? r "* : • V. i : :i < : * ; <* . . ....... • ii ■ -j 

} public void actionPerformed(ActionEvent „ ( clipSelected ( , , , 
addToTool Bar (clip) • addToV/i ewMenu ( clip. : ); 

^SV nc: ' Gnu = new JMenu c-:..t .:va«-i • 

addToMenuBar ( ca tMenu ) • 
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i tern. addActionListener (new Act ionLi sc ener ( ) { 
public void acc ionPer formed (Acc ionEvent •--) ( 

coggleCacalog ( ( ( JMenuIcem) e.gecSource ( ) ) . getText ( ) ) , 

) 

) ) ; 

addToCurrencMenu ( item] ; 

) 

cacMenu. secEnabled ( server . isXNavigacionAl lowed ( ) } ; 
coolBar . addSeparacor ( ) ; 

JPanel .^c:i-ii:0- = new JPanel (new Gr idLayouc ( 2 . 1 ) ) ,- 
seqPanel . setMaximumSize (new Dimens ion ( 150 . 100)); 
seqPanel . add (new JLabel ( *)); 
searches = new JComboBox ( ) ; 
searches . se LToolTipTexc ( 

seqPanel . add (searches ) ; coolBar . add ( seqPanel ) 
coolBar . addSeparacor { ) ; addPropagateBoxes (seleccion) ; 
Cable . secRowSeleccionAl lowed ( : -. . ^s) ; 
descr . secTexc ( family. descr) . 



Show che Tree belonging co chis family. 

orivace vcid shou-Tree ( J ( 

Scr ing ■ -. = .- v = 

inc = searches . gecSeleccedlndex () .- 

if (i > 0) search = I String ) searches . getltemAt ( i ! ; 

TreeFrame -..r = TreeFrame . openFrame ( server . family, search, subset); 
if (frame ! = :•...) frame . appendToHi s cory () ; 

Show che MSA belonging co chis family. 

orivace void showMSA ( ) ( 
Serine • :• = . . . ; 

inc ~ searches . gecSeleccedlndex {) ; 

if (i > 0) search = ( St r i ng ) searches . geel temAt ( i ) ; 

MSAFrame ■. . ~ MSAFrame . openFrame ( server . family, search, subset) ; • 

if (frame ! = :..-.) f rame . appendToHi s cory ( ) ; 



Open a new frame wich che selecced modules. 
Does nothing if chere are no selecced modules. 

private void cl ipSelecced ( > ( 
if (seleccion . hasSelecced () ) ( 
FamilyFrame r - . :. Train- = 

FamilyFrame . openFrame (server, f ami 1 y . gecld ( ) . sel ec c ion . gecSel ecced ( ) ) ; 
if (clipFrame != .-. j. 1 ) clipFrame . appendToHiscory ( ) 



Toggle a column displaying a given catalog. 
<?::«:■ »,:r :uni*:- che name of che catalog. 
• / 

privace void toggleCa talog ( f ina 1 String n.*in->) { 
try ( 

Catalog rue = server . gecCacalogByName (name ) ; 
if (cat ; = nu ! . ) ( 

int i = model . indexOf (cat ) ; 

if (i >=* 0) { 

model . removeCacalog (cac ) ; tabl e . removeColumn ( i ) ; 

else ( 

Fami lySequence ( ) socm = 

new F ami lySequence { fami ly . sequence . length ) ; 
for (i = 0; i < sequence . length; i*+) 

sequenced) = 
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server. oecFamilySeauence(trar „.,tjh * 
model. addCatalog< cat. * "' f «™ ly • sequence ( i ]. id) 

( ""e.addColunm(FamilyTable M odei.SEQUENCE_WIDTI(> .- 

) 

) 

j cacch (Exception ^) , UserMessages . show , . , 

Get the matching search ids £ or chis famUy . 

private void getSearches ( ) ( 

if (searches. gecitemCount { ) '= oi Q „ J> ,„ K „ 

searches. addltemC , searches . removeAl 11 terns ( I ; 

for^int : "To. \ • '""••"UN-'— C f ««i ly, ; 

searches . setEnabled (SSS .' length '' > 'uT ! Se " ches • addI 'names | i , , , 
( ^-^raSeq^nc^ 
public void dispose (J < 

^ openFrames. remove (name); super . di spose <> 

Apply che user preferred properties co chis frame. 
- ^Ppi> also the size property. 

table. applyp r opercies iwithSiae) • " ' 

^ if (withSize) packd; 

Gee all child frames. 

?U MCFramefr meU geCChiid ^n() ( 

MCFrame = = mI^HT* ' ??ff 1 1 Wi thPl ^ i* < fami 1 y . gecld ( , * ... 

MCFramef - . n "w MC?rlme?crei ^.^l^ily.i.fidll 'l 1 ); 
Syscem.arraycopy<crees 0 rel 0 tre« ? ^as.lengchj; 
return res; p y (msas - 0- res. trees. length, msas . length) , 

J 

H-LV^.^^^-^i^f- - a ser.ali.e,. 
public void writeParameters(ASCrT<;o^ a i' 



I^a ;? a ;^; family to be^hown. 

May be null BltSeC de£ln >"9 • subset of modules co be displayed. 

P-biic static Family Frame openFrame«ServerConnection „ PW . inc 

if (frame == null, ( ' Pami 1 yFrame ' °P^«n,es .get < f amid * • .. - . su bsetName) 
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cry ( 

Family tcin«i:y = server . getFami 1 y ( f amid) ; 
if (subset i= :iu:i) 

family = f ami ly . cloneSel ected ( subset ) ; 
frame = new Fami 1 yFrame ( server . family, subset); 

} 

catch (Exception i) { UserMessages . show ( e) ; ) 

> 

i f ( frame ! = i . ) ( 

frame . get Searches ( ) ; frame . apply Proper ties { ::ur- ) ; frame . secVisible ( r : :- ) ; 
frame . toFront ( ) ; 

) 

return frame; 



Creace a frame from a command read from a deserializer. 

The string has the format "catalog ' . • family- . 

Q.,ir '«!.-: the ServerConnec t i on co use. 

(?:.,;■:;: : r. che deserializer to read the command from. 

public scatic FamilyFrame openFr ame ( ServerConneccion , ;r- 

ASCI IDeser ial izer .-.) 

throws Exception 



( 



String '.itc = in . readString () ; 

Catalog . ..: = server . geCCatalogByName i name ) ; 

if (catalog = = . _ ) ( 

UserMessages . show (::■ 

return .. . . ; 

) 

in . checkChar ( ) ; 
name = in . readString () ; 

int ' : . . . = server . getFami ly Id (catalog . name); 
if (familyld < 0) { 

UserMessages . show ( ... , 

return 

) 

BicSec . • = 

if. (in.tcype == ) ( in.nexcToken I ) ; subset 

return openFrame ( server . familyld. subset); 



- name) ; 



* name) ; 



in . readai tSet { ) ; 



Gee the hiscory cext corresponding to the command read from a deserializer. 

the deserializer to read the command from. 

public static String getHis toryText ( ASCIIDeserial izer • :'.) 
^ throws XOException. SyntaxException 

StringSuffer = new S tr ingBu f c er ( ) ; 

String :..-.»'.;.;: = in . reads tring () ; in . checkChar ( J.- 
St ring = in. readString () ; 
text . append ( family) ; 
if (in.ttype == ) ( 

^ in. nextToken( ) ; text.appendf ); text . append < in . readBi tSec <). coString ()) ; 

text. append ( ) ; t ext . append ( cata log) ; 

return text . toStr ing ( ) ; 

) 

Dispose all open frames, 
public static void disposeAllO { openFrames . disposeAll ( ) ; ) 

Get ail open frames. 
•/ 

public static MCFrame ( | getAHO ( return openFrames . getAl 1 () ; ) 
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Apply the properties to all open frames. 

9pararo -vi^Ji^ true if the frame size should be set. 

public static void applyPropertiesToAll (boolean wich«i-«) { 
^ openFrames . applyPropertiesToAll (withSize, ; 

The Printable implementation: •/ 

public void print(PrintJob jcb. Rectangle *,« rt ) ( 
Graphics t = job . ge tGraphi cs ( , ; 

Pont ^r«n-. = Fontbase . getSansSerif { ioi- 
g . setFont ( ssFont, ; 

int : rr.- J : = 

Fon^• r : g ^-!. 1 ; OP ! r pon^H , ' gSt InC Pr ° per ^ 1 e "^^ P er t . es . M ODUL E _FONTSIZ£) 

l. on - ^ *- r a Fontbase .getSansSerif ( . fontSi-e)- 

FontMetrics :m * g . ge tFontMetri cs ( J ; 

3t^na?V te „ labelS an V an9fi ?/ nd thS Width of tne f i"t three columns 
string .jci. - new St r ing | f ami ly . sequence . 1 ength ) - columns 

inc 1 — • ■ -V ii1 -*o? Strin9lfainily sequence - len g t hl ; 

int = 0 ; 

int = 0; 

f °- 1 lodule{i 1< family - sequence - ien ^ ch - < 

-p5f? i i y ;!' qU * nCatil -^etFamilyModulesBylndexf family. getldt) ) 

otringBuffer :i: = new S:nngBuf fe:[) ; 

for (int = 0; j < modules . length; jl*> ( 

if (j > 0) buf. append! ); 
^ buf . append (modules ( j 1 . index ) ; 

label I i I = buf . toSt ring {, ; 

buf = new S t r ingBu f f er ( ) : 

for (int = 0; j < modules . lengch; ( 

if (j > 0) buf. append! ); 

buf . append (modul es ( j }. seqStart + 1); 

buf . append ( • , ; 

^ buf . append (modul est jj .seqStart - modules ( j J . seqLengch ) ; 

range (i J = buf . toSCring () ; 

int = f m. scringWidth (label f i J > .; 

if <w-> dxLabel ) dxLabel = w ; - 

w - fm.stringWidtht family. sequence! i I .name) • 

if (w > dxld) dxld = W; 

w s fm. st ringWidth( range { i )) ; 

if (w > dxRangel dxRange = w; 

/ / compute column positions 

int ^:mct = f m. charWidth ( ) ; 

mt :: i s area.x + dxLabel + space. - 

mt ::' t - -.r.'JL * xld * dxld ♦ space; 

int = -z- r s xHange + dxRange * space; 

int a area, width - xSeq; 

/• compute the number of lines 

int =hi-:«'.?n- = fm.getHeighco * f m. getMaxDescent ( ) • 
for m'T"' * {a " a :» ,ei 9ht / chHeight - 1) / 2 - 2; // count title 
Int'?- I \o ^^^J.^^V. sequence. length; i0 ♦ . lines, t 
if (il > family. sequence. length, il = f ami ly . sequence . length ; 

if IVt** Pa9e £ f n ° ne exists vet and print the title 

if (g 3 = nul!) g = job.getGraphics( , ; 
g. setFont ( Fontbase . getSansSeri f (' fV.'i.n- 14) J- 
g.drawString(getTitle(, * . ... ::J , . „. + 
area.x. area.y * chileight); 

// print the sequences 
int y = area.y + 2 • chHeight; 
cor (int i s io ; i < ^ . + ) { 

y^^cn^eight 6 S€CT:i9MCf " = famil y • sequence ( i J ; 
// print label, id and range 



1) - * 11. 
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> 

} 



g.setFont(ssFont) ; g . setColor (Color .black ) : 
DrawingUtilities . 

drawRightAdjustedStringtg, label [ i J . area . x > dxLabel y) • 
g.drawStrmg (sequence, name, xld. y) ; ' y ' ' 

g. drawstring (range I i I . xRange. y> • 
// print gray bar representing the sequence 
g.setColor(Color.gray) • 

g.fillRect(xSe q . family. indent (i, • dxSeq / f ami ly . maxLeng th. 
Y - 4 chHeight / 10. 

sequence. length • dxSeq / f ami ly . maxLength. 
2 chHeight / 10 ) ; 
g . setFont (moduleFont > ; 

for (int • « 0; j < sequence. module. length; j — ) 
DrawmgUtilities.drawModule 

/ '^"y-indenclil * sequence . modul e ( j j . seqS tart ) 
dxSeq / family . maxLength . y - 9 • chHeight ,'10 * CarC) 

i*^:^^ ' ««2iiy.-ii-ngch. 

a cnjfeight / 10. sequence . module f j J . f amName 
^ sequence. module! j | . f amid family . getldli , s ; 

■' *' print the descriptions 

y «• = chHeight 2; 

cor (in: . = io ; i < ii ; { 

print label, id and description 
g.secFont (ssFont) ; g. setColor (Color . black) • 
DrawingUtilities. ' ' 

drawRightAdjustedString (g. label (i I area * - Hv-i 
g. drawstring (sequence. name xld. y, areax dxLabel . y, ; 

^ g.arawString(sequence.descr. xRange. y) ; 

g.diIpSsen;% P = 9e . and fOCCe dilOCat -" °* ^ "•*-. P-ge 



The IndexSeiectionListener implementation: 

puolic void selectionChangeddndexSelection -•) { 

biwSec = selection . getSel ected () ; 

i« (i >= memberToSequence . length) 

els T " 1 ■ """""Tosequence. length - family . sequence . len 9 th ; 
row - memberToSequence ( i J ; 
^ table. tableChanged( new Tabl eModel Event (model , row, row)}; 

oldSelected = (Bi tSet) selected. clone () ; 

'' The CellClickListener implementation: •/ 
public void cellClicked(CellClickEvent ( 

lf in r! nL " 9etCliCkCOUnCn " 2 " event :getColumn(, l, ( 

int . s event .getRow( ) ; A ' 1 

if (row < family. sequence. length) ( 
SequenceFrame : .-. uiu* .= 



i f S t fVame^ : °P* nFrame Server . f ami ly . sequence ( row , . id » , 
^ it icrame .- ,....,) frame . appendToHistory () ; 



) 

) 
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The Exportable interface implementation: •/ 
Public String, g ec E xportForn,ats ( , , return exportror^ats , , 

switch (format) ( 
case (.■•»• •• . 

!or'unt eT - 0 • ; ^^V-^^ultae,, * " • |- 
FamUvSeo^ i-l !* 

out. W ri C e ( :-s eq .n::i 1 r se T nceU,; 

out.write{ _ mod. fainName * 

* (mod . seqstart + l) ♦ 
^ (mod. seqstart * mod. seqLength) - 

out . wri te ( ) • 

J 

break; 

case ; _ •- r ■ . . 

break rite<£amily " COString(n ; 

) 
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package .•nas:i:r.iwui:;.cjui; 

import .7»aiL-?r z.i l .h : -?g . ns . * ; 
import :nas:sr:.:;."i 1 :c . jui . «v«n: . 

import jovo . j i i : . 

import . i=v. : r.r: . - a.— • * . • i 



The table model responsible for repesentat ion of family information wich 
sequences ana their modules. 
Q ... ~.nr : Lukas Knecht 
3 ~l .1 ; :.. 08-02-1998 



Sid: FamilyTableModel. java.v 1.6 1999'08/19 11:25:45 knecht Exp S 

SLog: Famil yTabl eMode 1 . j ava . v S 

Revision 1.6 1999/08/19 11:25:45 knecht 

Various GUI changes. 

Revision i.S 1999/07/30 08:43:13 knecht 
Various GUI changes and modifications. 

Revision 1.4 1999.07.-19 08:16:36 knecht 
Various GUI changes . 

Revision 1.3 1999 04 26 10:13:19 <cnecht 
Modifications for V2 . 0 

Revision 1.2 1999'0218 21:45:02 knecht 
Various big changes and enhancements for VI. 0 

Revision 1.2 1998/12/14 22:19:26 sgc 

Various bug fixes and enhancements, including the export function and MSA ordering 

Revision 1.1 1998.10/02 14:15:46 knecht 
Initial revision 

public class FamilyTableModel extends MCAbs t raccTab: e: v .oce 1 
implements SecuenceDisplayable 

The preferred - width of the sequence columns, 
public final static in: .?? ; *v: r.;>-_ /: y: = 300; 

private Catalog - ,: 1.,:; , / che main catalog to be displayed 
private Family ,, che main family to be displayed 

private String! ). vr :• ..- = { , - • , . . * , 

:>•:. } ; 

private mt() = {30. 100, 200. 150. 80, SEQUENCE WIDTH } • 

private Vector : • .* : M : . .v. . = new Vector () ; 
private Vector :t:w-«:, ru-snc v i = new' Vector <) ; 
private Object!] () ?:cr.r 1,7,. 1. 1.; 
private int(J .*:-:r t.t 1 r.c*n-- ; 



Construct a new family table model. 

Sr..ira:;: ~-x: the catalog the family belongs to. 

S:;-ira.*r. :jm- :•. che family to be displayed. 

public FamilyTableModel (Catalog -.j 1: .1 ;?g , Family f . 
setContext (catalog, family); 



Set the family to be displayed. 

fl::. ..*•«:: the catalog the family belongs to. 

f.ixi !v the family to be displayed. 
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for (int : = 0; i < data. length; i**)"' " ' 
) g .removeAU Elements () otherSequ.nces . removal 1 Events ( ) , 

? ^:aur e ?r" C ,'.L getROWUnt ^UySequence , 

Object,, I:! -- ; ne W < Ob°ec=r6i lengCh; J ~» ^^(J, . modules ( j | . index; . 

rowu! : "! W Selec " bie String, indices,; 
row[ij = sequence . name; 
row (2 I a sequence. descr; 
rowf3] a sequence. getSpecies< ) ; 

* r (J > 0} buf. append { ). 
buf . append (modules I j J . seqScar t - I) ■ 
buf . append { ) ; ' 

buf . append (modules f t 1 seaSrarr ^ ~ . i 

} ^UJ.seqstarc * modules [ j | . seqLengch ) ; 

row|4J = buf . coScring () ; 
rowf 5 I a sequence; 
return row; 

} 

(?:..;• i- ^ ?! " ly Co che mai " catalog column 

■ 'k extra sequences to oe added. 

context ,r t h, ^u' 8nt the eXt "' a »•«*»"«. to be added in the 

^"l.^::^* 8 ^---*'^"^-",, iatt) t , 

extrIoIta bl : R ^°r leCed ' daCalen ' Ch - °"a. length - extraOata . length - 1); 
if (sequences != -.- j ( 

is^ a r ea e d y C SLTa d ed P 1 1 s ai 'r„ 0 !:° Cher ""^ a catalog which 

the catalog to add. 

modules "oTcae 5 " m * *•*»•»«•• as displayed by this model, but with 

^ t °^erCatalog s .add E1 ement ( cat.„ a me,; ocherSequences . addElement « sequence, ; 

=^og £ to a c C hec a x°for ^ " ^ — ^ - shown. 
• «tj.r 3 the model column index of the catalog or -l. 

public int indexOf (Catalog on) ( 

IT (I : = °^ e f Cata i°' a i«°exOf( cat. name) ; 
it (i »= 0) i »= data(0| .length; 
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recurn i; 

} 



/ * * 



n:c°dr s pi:yer?s d a^o!rp n9 an ° Cher "" l09 - RemOVing * Ca " 109 " hi = h is 
(?r..ir.i:r. mc the catalog to remove. 
* / 

public void removeCatal og (Catalog :.ir ) { 
int . = o therCata logs . indexOf (cat . name) ■ 
if Ci >• 01 ( 

^ ocherCacalogs.removeElemencAC(i); ocher Sequences . remove El emencAc ( i ) ; 

) 

The AbscraccTableModel implementation. 

public String gecColumnName ( i nt :;.) ( 

if (col < labels . length I return labels | col J; 
^ return ( String) otherCa calogs . el ementAt I col - label s . 1 eng ch ) ; 

The MCAbscracCTableModel implemencac ion . • 

public Scring gecToo iTipTex cA t { i n c • . . i nc - ) ' 
FamiiySequence = 

row < daca.lengch ? fami ly . sequence | row) : 
( FamiiySequence) extraDaca ( row - da ca . 1 engch ) [ 5 } • 
if (col == 1) recurn seq . get Keys ( SeqKey Group . IDENTI FI EP. GROUP) 
if (coi == 2) recurn seq.descr; 
if (col 3) return seq. getSpecies () ; 

return . 

) 

public :n: : j gecColumnWidths ( ) { return widths; ) 
public in- gecColumnCounc ( ) ( 

^ recurn super . gecColumnCounc ( ) * otherCatalogs . si ze ( ) ; 

public :n: gecRowCounc ( ) ( ... 

if (extraDaca == ... ) recurn super . gecRowCounc () • 
^ else return super . gecRowCounc ( ) * excraData. length; 

public Objecc gecValueAc ( inc . int ) ( 
if (row < daca.lengch) ( 

inc . = col - data { row) . length; 

if (i < 0) recurn super . gecValueAc ( row. col); 

recurn ( ( FamiiySequence ( 1 ) otherSequences . el emencAc ( i ) ) ( row J ; 
else ( 

row - = daca.lengch; 

if (col < excraDacaf row). length) recurn extraDaca ( row j ( col ) ; 
else recurn .". ; ; 

) 

) 

public Class gecColumnClass ( inc ( 
Class ■ , ; ; 

Objecc .a.u-i a ge CValueAc ( 0 . col); 
if (value •»:.!;) els = value - gecClass () ; 
else ( 



// Objecc. class does noc work in all VMs 

cry { els = Class . f orName ( .. . • j 

cacch (ClassNotFoundExcepcion *! ()'// ignore 



recurn els; 

) 



'• The SequenceOisplayable impl emencacion . 
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public inc getFamilyldt) ( return f amily . getld ( ) ; } 

public inc ge tlndent { inc raw) ( 

if (row < data. length) return family. indent trow j • 
^ else return extralndent ( row - data . length J ; 

public inc getMaxLengthl) { return f ami ly . maxLengch ; 
/ • * 

Gee the family being displayed. 
9re:-:r. che family. 

public Family getFamilyM { return family; ) 
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package .-n.i s ::-r:.-.^ x. j . _ ; 
import ;a\*i .-:■„■... •; 
/ • * 

The interface for a class which li^rono ^« • 

8:.:r.::or Lukas Knechc UsCens co lnde * selection changes, 

e-srjin; 01-17-1999 

• / 
/ * 

Sid: IndexSelectionUscener. java.v 1 i 1990/0-/10 -> 1 a-* -> n • 

SLog: IndexSel ec t i onLi s t ener java v S 19 " /0a/18 21:42:29 *nechc R e l s 

Inlr^r 1 - 1 . l999 ' 02 ''l« 21:42:29* knecht 
initial revision 

public interface ;r. : .. < 

Notify a listener that an index selection has changed. 

the Inae *Seleccion which has changed. 
^ public void seiectionChangedCncexSelectior. ,;.-:-. : -. J; 
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package mas z area :a log . g U i ,- 

import mast ■* rca; ales . ds . * ; 
import masc-3rr.it.-, leg. jui . «ver.= . 
import nws:^rcii.i!(r;.'jci.: . •; 

import Java . .r.vr. . ♦ ; 

i mporc ; „i «.•-, . ... -_ . £ • • -_ n - . . 

i mp o r t ; V. i . s i r. ? . • ; 

i moor t ; a va :•: . .«>. mg. - . . » ; 



• iu-.scc Lukas Kneci? openin ^ of the carresponding famiiy frajne 



selects 



Sid: FamilySequenceRenderer.java.v 1.6 1999/08/19 nos J5 , 
SLog: FamilySequenceRenderer. java. v S Ay ** /08/19 U:25:45 knechc Exp S 



Revision 1.8 1999/08/19 11:25:45 knechc 
Removeo crawModule ( ) . c 

Revision 1.7 1999/08/10 12:55:11 knecht 
Change for new busy indicator. Kne <^ 

Revision 1.5 1999/07/30 08:43:13 knechc 
various GUI changes and modi f i ca cions 

Revision 1.5 1999/07/19 08:16:29 knechc 
Various GUI changes. . c 

Modx-i^r 14 I 999 ' 04 '** 10:13:19 knechc 
Modifications for V2 . 0 

Revision 1.2 1999/02/18 21:45:02 knechc 
Various oig changes and enhancemencs for VI. 0 

Revision i.2 1998/12/16 16:24:45 knechc 
-aaec gecFGColori; and gecBGColorn 

InT-faT-ivr . 1998/10/02 ":I5:«- knechc 
initial revision 



pr 1 va ce MCTabl e ; : J. \ t * ' "*' 
privace inc 7 1 : c: i.-:- ; 
privace Font - 

privace SlowAction ; r,. i: •■ j ^ . 

privace IndexSe lection «.> \ v~ : : 

class Cellscace { 

inc tainlr.. :nc:cn-:, nwi:<L*ncre n; 
FamilySequence Jequoncr; " 
Dimension iii:z" z ; 

CellStateO ( size = new Dimension(, ; } 

CellState(SequenceDisplayable di snavnbl . 

Dimension ( * d/flbU ' lnt : - v - FamilySequence sw^nr* 

chis.sequence * sequence; this. size * size- 
^ setDisplayable(displayable. row); 

void'aetDisplayablefSeoupnr^n^oi,.,^,. 



d m ?" Di ^ la r able(Sec T uenceDis Pl^able s:i s i-j|.-.»ir--.. 
f^Id = displayable.geCFamilyId(i • 
TndenT'V . get^xLength , 

indenc » displayable . ge tlndent ( row) , 



K > ; 
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// scale (0,maxl,ength) to (0 , size .width) 

inc scaleX (int <) ( return x * size. width / maxLength; } 
// scale (0.10) to < 0 . s i ze . height ) 

int scaleY(int y> { return y • size. height / 10; ) 

// find a member by its x coordinate 
Module f indModule (int .<) ( 
int 

for <i = 0: i < sequence, module, length 

( scaleX (indent * sequence . modul e ( i ]. seqS tart ) > x II 
scaleX( indent + sequence . modul e ( i ). seqS tar t * 
sequence. module ( i 1 . seqLength) <- x); 

l * - ) ; 

if (i < sequence. module. length) return sequence . module ( i 1 • 
else return :vj . 1 ; 

} 



Construct a new renderer for family sequences. 

public FamilySequenceRenderer ( ) ( 

setBackground (Color. white) ; setFontSi ze ( 1 0 ) • 
toDraw = new CeliState(); 
openAction - new SlowActionO ( 
public void act ( ) { 

r? d V 2 f- iJ, "1 , = ciicked - ' indModule (clicked*) ; 

if (clickedModule != :n.. . ) { 

Famuyframe : ■-: a Fami 1 yFrame . openFrame 

(clickedTable.getServerConnectionO . cl ickedModul e . f amid -j ) 
^ if (frame != v.;..) frame . appendToll is tory () ; 

) 

) ; 

} 

Set the :ont size used in rendering the modules. 

public void set Font Si ze ( i nt >:::•-_) ( 

^ font = Fontbase . getSansSeri f ( \ , size); 

Set the IndexSelection determining the selected modules 
bv M rhL^„^" :7r Inde ^f ele ccion belonging to the family displayed 

by this renderer or null if there are no selected modules. 

public void setSe.l ection ( IndexSel ect i on - -,-) { 

^ this. selection = selection; 

public void paint (Graphics g) ( 
if ( toDraw. sequence != null) ( 
getSize < toDraw. size) ; 

if (toDraw. sequence. wasModulari zed) g . setColor (Color . gray ) . 

else g. setColor (Color. lightGray) ; or. gray;, 

g .drawRect ( toDraw. scaleX ( toDraw . indent ) , toDraw. scaler ( 4 ) 

_ f illR ^ P , "n raW * SCa * eX ( toDraw. sequence, length) . toDraw . scaleY ( 2 ) ) ; 

g.fillRect (toDraw.scaleX(toDraw. indent) . toDraw . scaleY { 4 ) 

if ffonr . c ° D ""- sc »leX<toOraw. sequence, length). toDraw . scaleY ( 2 )) ; 

if (font •= null) g.setFont(font) ; 

JLi l ? C Z 1 < toDraw - sequence. module, length; + ) ( 

Module module = toDraw . sequence . modul e ( i } • 

boolean of Family = toDraw. f amid module . f amid • 

DrawmgUtilities. dr awModu 1 e 

(g. 

toDraw. scaleX (toDraw. indent + module . seqStart) 
toDraw.scaleY(l) . 
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toOraw . scaleX (module . seqLengch) 
toOraw, scaleY (9) . 
module . famName, of Family, 
ofFamily selection ! = ' rt\ '' && 
^ selection. getSelectedO . gee [module . index) ) ; 

i f C ! toDraw . s equence . wasModul a r i 2 ed ) f 
String :e::r = ... - m „;,^- ; .. > . .... . 

FoncMecrics rm = g . getFon tMetrics '( ) • 

mt a fm. stringWidch( text) ; 

1R W " D «w.scaleX(toDraw. indent) * 

(toDraw.scaleX(toDcaw. sequence. length) - w) / 2- 
g.setColor (Color. black) ; 9 ' w) /2f 

g.drawString(text; x. toDraw. scaleY ( 8 )) ; 



The TableCellRenderer implementation: 
public Component ge tTableCe 1 1 Render erComponen t (JTabl e 

bool ean 
boolean 

return this; 



Object 



toDraw.setDisplayable ( ( Sequenced sc 1 a vabl e , table c^m^ 1 
^°°raw. sequence = ( Fami i ySequence ) va lue 7 r ° W ' 

) 

' The ClickableCellRenderer implementation: - 

public void =eliCiickec(MCTable ■ --Mr-n 

if (value « = •. ) ' - ei ^l:ckEvent ...... Object ) { 

if cir^tate eCCliCkCOUnCn =S 1 &i "Action , ( 

= new Ceil3tate((SequenceDispiayabie l table. getModel , ; 

event. getRowi ) . ' 
( Family Sequence) value . 
K todu'' * 1 . event . getSize ()) - 

) 

else if (evenc.gecClickCouncO = = ■> ) ( 

Gee position dependent tooltips: • 
pubiic String gecToolTipTextCMCTable Cel lcUckEv8nt .... 

String , suJlj 0bjecc ! -"> ( ' " ' 

if (value ! a ::•..!) *( 

CeilStace S ,k, = ^CeXXStaccSe^enceDisplayaWe.tabie.getMod-l,, 

event.getRow( ) . y * ' ' 

Module Tir,cv.i-c- = state £inrfM rt H 1 J Fami 1 y Sec ^ence > val ue . event . getSi ze <)> • 
if (module != r,u J ( ^ ' 9etX ( 1 } ; ' 

f? X L% i' '*' V ' * mod ^ie. famName; 
} eLe^extV^* I s text 

> - 

return text; 
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